Where Do I Get A Gtf File With Proper Gene Ids For Read Counting?
2
0
Entering edit mode
12.0 years ago
Ryan Thompson ★ 3.6k

I've just noticed that the knownGene.gtf file that you can download from the UCSC browser (for e.g. human) sets each transcript's geneID to be the same as the transcript ID. This causes problems when counting the number of reads mapping to each gene, since reads that align to shared exons of multi-isoform genes will be discarded. Is there any way to get these transcripts properly grouped into genes so that I can assign them geneIDs and count my reads properly?

Alternatively, is there some other human gene annotation I should be using for read counting?

rna-seq annotation • 5.7k views
ADD COMMENT
1
Entering edit mode
12.0 years ago
JC 13k

use Ensembl annotation, the genes have different IDs (ENSG\d+) than the transcripts (ENST\d+).

ADD COMMENT
1
Entering edit mode
10.7 years ago

I think you can download what you need from Illumina iGenomes:

http://cufflinks.cbcb.umd.edu/igenomes.html

ADD COMMENT

Login before adding your answer.

Traffic: 883 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6