Entering edit mode
6.6 years ago
Payal
▴
160
Hi,
I am analyzing RNA Seq data with EGFP. I need to get the count of EGFPs for samples. So I am thinking to align the fastq files to the EGFP sequence concatenated to the reference genome and then get the counts using HTseq counts. I have the EGFP.fa file, but how to create the EGFP GTF file as thats required for Htseq counts and for annotation!!
EGFP Sequence:(Is this ok?)
>EGFP
ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAA
GTTCAGCGTGTCCGGCGAGGGCGAGGGCGATGCCACCTACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGC
TGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAG
CAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTTCTTCAAGGACGACGGCAACTA
CAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGG
ACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTACAACAGCCACAACGTCTATATCATGGCCGACAAGCAGAAGAAC
GGCATCAAGGTGAACTTCAAGATCCGCCACAACATCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACAC
CCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCGCCCTGAGCAAAGACCCCAACG
AGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAA
GTF: (Is this ok?)
EGFP EGFP exon 1 720 0.000000 + . gene_id "EGFP"; transcript_id "EGFP";
Thanks, Payal
If your EGFP.fa file looks like this
then make your GTF file to have a line at the end with something like this (chromosome name
EGFP
needs to match in both, add right length of sequence in place of 1000)This helped me a lot.
Just another update on this: You'll need to add "transcript_id" to the GTF file to get it to work.