Most "Accurate" Gtf File For Cufflinks??
2
0
Entering edit mode
13.2 years ago
Sakti ▴ 20

Hi,

Recently I've been using Cufflinks to analyze some mouse RNA-seq data. However, when trying to decide which GTF file to use, I find an important conundrum: all files differ among themselves! For example, the latest version of RefGene UCSC gives a line number of 267796, while NCBI gives 1259220. 1 million more entries I think are very likely to affect Cufflinks results.

So my question is: what do these different files include? Where is an explanation to each of them? I've never been able to discover a site that explains what was included in each annotation (i.e. protein-coding genes, non-coding genes, pseudogenes, etc), or where can I find a gtf file which reports only manually curated genes (vega?)?

Also, in your experience, what would be the best gtf file to use?

Thanks! Any comment will be highly appreciated

cufflinks gtf rna database • 13k views
ADD COMMENT
4
Entering edit mode
13.2 years ago
Marina Manrique ★ 1.3k

Hi!

I'm currently using the UCSC hg19 iGenome (http://tophat.cbcb.umd.edu/igenomes.html), there is another iGenome for mouse. iGenomes include, among many other interesting things, a genes.gtf file correctly formatted for Tophat-Cufflinks. Indeed it's recommended in tophat site to use these files.

These files are augmented with the special attributes Cufflinks needs to perform differential splicing and promoter analysis. We strongly encourage users to download and try these packages!

In particular, the genes.gtf file included in the iGenome UCSC hg19 contains the refFlat.txt.gz from UCSC and this table has 'native refSeqs that align one or more times to the corresponding genome' according to this message in the UCSC mailing list https://lists.soe.ucsc.edu/pipermail/genome/2008-June/016584.html

So I suppose all refseqs sequences that map one or more times to the hg19 genome are included, including genes, miRNA, lincRNAs, etc And it would be the same for mouse genome.

HTH

Marina

ADD COMMENT
0
Entering edit mode

Updated link for tophat iGenomes, now:

http://ccb.jhu.edu/software/tophat/igenomes.shtml

ADD REPLY
1
Entering edit mode
11.6 years ago
Rm 8.3k

How about current release (16) of Gencode annotations:

http://www.gencodegenes.org/releases/16.html

Version 16 (November 2012 freeze, GRCh37) - Ensembl 71

Statistics:

ADD COMMENT
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 1889 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6