Mrna Gtf File
1
0
Entering edit mode
11.1 years ago

Dear Team,

I am working on mRNA human Tumor gene Expression. I ran tophat followed by cufflinks using hg 19 reference..

Now i wish to annotate the output of cufflinks with the mRNA gtf file.

My doubts are.. 1. should i Map my data with mRNA.fa or hg19 reference while running tophat?? 2. could you suggest me to get the mRNA gtf file??

i tried downloading the mRNA gtf file from ucsc table browser as mentioned below..

Group: Genes and gene Prediction Track Track:Refseq Genes Table:Humanmrna(all_mrna)

the output file size is around 200MB. But i heard known mRNA are only 22000.

Could you Suggest me to get the mRNA.gtf file and the reference to be used during mapping??

Thanks Sri

tophat2 cufflinks rnaseq ucsc • 3.5k views
ADD COMMENT
0
Entering edit mode
11.1 years ago

Nitpick: You have a "question", not a "doubt". I've seen a lot of people use this phrase, so I assume that this is incorrectly taught in some country (or countries).

  1. Align to the hg19 reference. Since you're using tophat, give it the GTF and it will first align to the transcriptome and do the conversion of the mapping coordinates back to the genome for you.
  2. If you downloaded the genome from UCSC, then get the annotation file from there too. Don't try to mix a reference genome from Ensembl with an annotation from UCSC (or vice versa), as the chromosome names are different. Alternatively, just download the appropriate bundle from iGenomes and you'll have matched bowtie indices and annotation files. That's rather convenient.

Regarding the size of the GTF from UCSC, that 22000 number refers more to the number of genes. Each gene can (and often does) have many different transcripts. That really balloons the size of the file. It may seem large, but that's not unreasonable.

ADD COMMENT

Login before adding your answer.

Traffic: 1571 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6