In htseq count we need the .gtf file and in the tutorial they said we cannot use one from UCSC, is any one know the source to get the hg19.
EDIT: Post title edited by Ashutosh
In htseq count we need the .gtf file and in the tutorial they said we cannot use one from UCSC, is any one know the source to get the hg19.
EDIT: Post title edited by Ashutosh
Which tutorial? By the way, it's best that you use the Ensembl GTF when running htseq-count.
You can also use GTF from gencode (I am using it without any problem). And by the way the GTF formats from any repository should work with HTSeq.
It is true that Gencode GTF works fine with htseq-count, I have used that as well. But I'd be cautious before saying that other formats (especially UCSC) works as well as Gencode and Ensembl. I have observed that some programs like the python scripts in DEXSeq & even some Cufflinks' programs like cuffcompare, work really well with Ensembl but not with Gencode.
EagleEye Sure. Sometime soon.
Update:
Alright, so I found my own question that I posted a couple of months(?) back. I couldn't figure out what's wrong until I changed my GTF to Ensembl and things started chugging along. By the way, my pipeline got stuck at the differential expression stage using the cuffdiff program.
Update2:
Quoted from DEXSeq Manual Section 2.4:
We have tested our tools chiefly with GTF files from Ensembl and hence recommend to prefer these, as files from other providers sometimes do not adhere fully to the GTF standard and cause the preprocessing to fail.
komal.rathi and EagleEye
I just thought to share this that
htseq-count only reports one hit per aligned read; if a read is aligned for two different transcript then it is counted for same gene where it belongs to.
Whatever GTF you use, your GTF file needs to indicate which transcripts belong to the same gene. e.g. exon lines from two transcripts of same same gene should have same gene_ID
but different transcript_ID
.
I know that we can not use UCSC table browser GTF because it has same gene_ID
and transcript_ID
, so htseq-count looses all those reads.
All we need to look in our gtf is that gene_ID
and transcript_ID
is different then htseq-count works best
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
oh, sorry,
i meant in the HTSeq 0.6.1p2 documentation
at the answer on one of the common question.
Thanks alot, I got one and it works with me.
I have moved the comment to answer.