htseq-count produces no features amid alignment
1
0
Entering edit mode
9.0 years ago
bojingjia ▴ 10

I recently aligned some sequencing data using STAR against an mm10 prebuilt genome. Afterwards, I sorted and indexed using samtools, and proceeded to generate read counts using htseq-count (and appropriately, an mm10 Ensembl gtf file). But all of my read counts are peculiarly all 0's, classified as no features.

A number of other users have reported the same problem here on BioStars, but their concerns weren't resolved. I have to wonder if my reads failed to align, but a quick look at the bam files in IGV shows many aligned reads. Am I using a faulty mm10 annotation file? Would anyone have suggestions/comments?

htseq-count alignment RNA-Seq samtools • 3.1k views
ADD COMMENT
1
Entering edit mode

hi,

A couple of checks -

  1. Default sorting order expected by HT-Seq in the BAM is name. Most aligners return coord. sorted BAM
  2. You aren't using a GTF file downloaded from UCSC. Last time I checked it had conflict in gene_id with transcript_id values. Read more in the FAQs at the end of this page.
  3. Chr name style is same in your BAM and GTF
ADD REPLY
5
Entering edit mode
9.0 years ago
ablanchetcohen ★ 1.2k

Do the sequence names in the BAM file and the GTF file match?

UCSC and Ensembl use different chromosome nomenclatures.

I will never understand why we can sequence the human genome, and put men on the moon, but not agree whether chromosome 1 should be referred to as chr1 or 1.

ADD COMMENT

Login before adding your answer.

Traffic: 2605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6