Question

gtf file for htseq count

1

Entering edit mode

10.5 years ago

Tawfiq ▴ 10

In htseq count we need the .gtf file and in the tutorial they said we cannot use one from UCSC, is any one know the source to get the hg19.

EDIT: Post title edited by Ashutosh

RNA-Seq • 13k views

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.5 years ago by Tawfiq ▴ 10

Ram · Answer 1 · 2014-11-12

3

Entering edit mode

10.5 years ago

komal.rathi ★ 4.1k

Which tutorial? By the way, it's best that you use the Ensembl GTF when running htseq-count.

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.5 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

oh, sorry,

i meant in the HTSeq 0.6.1p2 documentation

at the answer on one of the common question.

Thanks alot, I got one and it works with me.

ADD REPLY • link 10.5 years ago by Tawfiq ▴ 10

0

Entering edit mode

I have moved the comment to answer.

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.5 years ago by Ashutosh Pandey 12k

Ram · Answer 2 · 2014-11-13

1

Entering edit mode

10.5 years ago

EagleEye 7.6k

You can also use GTF from gencode (I am using it without any problem). And by the way the GTF formats from any repository should work with HTSeq.

http://www.gencodegenes.org/

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.5 years ago by EagleEye 7.6k

1

Entering edit mode

It is true that Gencode GTF works fine with htseq-count, I have used that as well. But I'd be cautious before saying that other formats (especially UCSC) works as well as Gencode and Ensembl. I have observed that some programs like the python scripts in DEXSeq & even some Cufflinks' programs like cuffcompare, work really well with Ensembl but not with Gencode.

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.5 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

Can you please post the errors which you get with Gencode GTF? So that it will be helpful for others to know about it and rectify. It would be great help if you can post (Also mention the Gencode version).

ADD REPLY • link 10.5 years ago by EagleEye 7.6k

0

Entering edit mode

EagleEye Sure. Sometime soon.

Update:

Alright, so I found my own question that I posted a couple of months(?) back. I couldn't figure out what's wrong until I changed my GTF to Ensembl and things started chugging along. By the way, my pipeline got stuck at the differential expression stage using the cuffdiff program.

Update2:

Quoted from DEXSeq Manual Section 2.4:

We have tested our tools chiefly with GTF files from Ensembl and hence recommend to prefer these, as files from other providers sometimes do not adhere fully to the GTF standard and cause the preprocessing to fail.

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 10.5 years ago by komal.rathi ★ 4.1k

0

Entering edit mode

Yes I agree that UCSC GTF will not work properly. Thanks for mentioning it. I should have mentioned it clearly.

ADD REPLY • link 10.5 years ago by EagleEye 7.6k

0

Entering edit mode

komal.rathi and EagleEye

I just thought to share this that

htseq-count only reports one hit per aligned read; if a read is aligned for two different transcript then it is counted for same gene where it belongs to.

Whatever GTF you use, your GTF file needs to indicate which transcripts belong to the same gene. e.g. exon lines from two transcripts of same same gene should have same gene_ID but different transcript_ID.

I know that we can not use UCSC table browser GTF because it has same gene_ID and transcript_ID, so htseq-count looses all those reads.

All we need to look in our gtf is that gene_ID and transcript_ID is different then htseq-count works best

ADD REPLY • link updated 3.2 years ago by Ram 45k • written 10.5 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

I am facing the same problem with HTSeq. I downloaded the GTF from UCSC genome browser. I am using NCBI's RefSeq (Human Transcriptome) as a reference. for this reference what is the best way to get the GTF file for HTSeq???

Thank you in advance.

ADD REPLY • link 7.6 years ago by KVC_bioinfo ▴ 600