gtf file for htseq count
2
1
Entering edit mode
10.1 years ago
Tawfiq ▴ 10

In htseq count we need the .gtf file and in the tutorial they said we cannot use one from UCSC, is any one know the source to get the hg19.

EDIT: Post title edited by Ashutosh

RNA-Seq • 12k views
ADD COMMENT
3
Entering edit mode
10.1 years ago
komal.rathi ★ 4.1k

Which tutorial? By the way, it's best that you use the Ensembl GTF when running htseq-count.

ADD COMMENT
0
Entering edit mode

oh, sorry,

i meant in the HTSeq 0.6.1p2 documentation

at the answer on one of the common question.

Thanks alot, I got one and it works with me.

ADD REPLY
0
Entering edit mode

I have moved the comment to answer.

ADD REPLY
1
Entering edit mode
10.1 years ago
EagleEye 7.6k

You can also use GTF from gencode (I am using it without any problem). And by the way the GTF formats from any repository should work with HTSeq.

http://www.gencodegenes.org/

ADD COMMENT
1
Entering edit mode

It is true that Gencode GTF works fine with htseq-count, I have used that as well. But I'd be cautious before saying that other formats (especially UCSC) works as well as Gencode and Ensembl. I have observed that some programs like the python scripts in DEXSeq & even some Cufflinks' programs like cuffcompare, work really well with Ensembl but not with Gencode.

ADD REPLY
0
Entering edit mode

Can you please post the errors which you get with Gencode GTF? So that it will be helpful for others to know about it and rectify. It would be great help if you can post (Also mention the Gencode version).

ADD REPLY
0
Entering edit mode

EagleEye Sure. Sometime soon.

Update:

Alright, so I found my own question that I posted a couple of months(?) back. I couldn't figure out what's wrong until I changed my GTF to Ensembl and things started chugging along. By the way, my pipeline got stuck at the differential expression stage using the cuffdiff program.

Update2:

Quoted from DEXSeq Manual Section 2.4:

We have tested our tools chiefly with GTF files from Ensembl and hence recommend to prefer these, as files from other providers sometimes do not adhere fully to the GTF standard and cause the preprocessing to fail.

ADD REPLY
0
Entering edit mode
Yes I agree that UCSC GTF will not work properly. Thanks for mentioning it. I should have mentioned it clearly.
ADD REPLY
0
Entering edit mode

komal.rathi and EagleEye

I just thought to share this that

htseq-count only reports one hit per aligned read; if a read is aligned for two different transcript then it is counted for same gene where it belongs to.

Whatever GTF you use, your GTF file needs to indicate which transcripts belong to the same gene. e.g. exon lines from two transcripts of same same gene should have same gene_ID but different transcript_ID.

I know that we can not use UCSC table browser GTF because it has same gene_ID and transcript_ID, so htseq-count looses all those reads.

All we need to look in our gtf is that gene_ID and transcript_ID is different then htseq-count works best

ADD REPLY
0
Entering edit mode

I am facing the same problem with HTSeq. I downloaded the GTF from UCSC genome browser. I am using NCBI's RefSeq (Human Transcriptome) as a reference. for this reference what is the best way to get the GTF file for HTSeq???

Thank you in advance.

ADD REPLY

Login before adding your answer.

Traffic: 2421 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6