Question

Mrna Gtf File

0

Entering edit mode

11.1 years ago

sridhar2bioinfo ▴ 20

Dear Team,

I am working on mRNA human Tumor gene Expression. I ran tophat followed by cufflinks using hg 19 reference..

Now i wish to annotate the output of cufflinks with the mRNA gtf file.

My doubts are.. 1. should i Map my data with mRNA.fa or hg19 reference while running tophat?? 2. could you suggest me to get the mRNA gtf file??

i tried downloading the mRNA gtf file from ucsc table browser as mentioned below..

Group: Genes and gene Prediction Track Track:Refseq Genes Table:Humanmrna(all_mrna)

the output file size is around 200MB. But i heard known mRNA are only 22000.

Could you Suggest me to get the mRNA.gtf file and the reference to be used during mapping??

Thanks Sri

tophat2 cufflinks rnaseq ucsc • 3.5k views

ADD COMMENT • link updated 11.1 years ago by Devon Ryan 104k • written 11.1 years ago by sridhar2bioinfo ▴ 20

score 0 · Answer 1 · 2013-10-15

Nitpick: You have a "question", not a "doubt". I've seen a lot of people use this phrase, so I assume that this is incorrectly taught in some country (or countries).

Align to the hg19 reference. Since you're using tophat, give it the GTF and it will first align to the transcriptome and do the conversion of the mapping coordinates back to the genome for you.
If you downloaded the genome from UCSC, then get the annotation file from there too. Don't try to mix a reference genome from Ensembl with an annotation from UCSC (or vice versa), as the chromosome names are different. Alternatively, just download the appropriate bundle from iGenomes and you'll have matched bowtie indices and annotation files. That's rather convenient.

Regarding the size of the GTF from UCSC, that 22000 number refers more to the number of genes. Each gene can (and often does) have many different transcripts. That really balloons the size of the file. It may seem large, but that's not unreasonable.