Question

htseq-count overcounting non-unique alignments

3

Entering edit mode

9.3 years ago

jc.szamosi ▴ 50

Hi,

I have some reads that I aligned with STAR aligner version 2.4.2a and then ran through htseq-count version 0.6.1.

I'm noticing a discrepancy between what STAR reports as "Number of reads mapped to multiple loci" (7.1M, ~11% of reads) and what htseq-count reports as __alignment_not_unique (22.3M, ~31% of reads). This is a pretty big discrepancy. I ran the same thing on multiple files, both single-end and paired-end, and while the size of the discrepancy varies, it looks like it's always there.

I was wondering if anyone could help me figure out what's causing this. Shouldn't HTSeq be taking that count directly from the bam file?

Thanks,

Jake

htseq-count RNA-Seq STAR • 6.3k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.3 years ago by jc.szamosi ▴ 50

0

Entering edit mode

What about "% of reads mapped to too many loci"? This is different from "% of reads mapped to multiple loci".

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.3 years ago by Ashutosh Pandey 12k

Ram · Answer 1 · 2015-09-10

4

Entering edit mode

9.3 years ago

michael.ante ★ 3.9k

Hi Jake,

The __alignment_not_unique is rather an alignment count than an read count. If you sum up the gene counts, no features, and ambiguous counts, you'll get very very close to your uniquely mapping reads form the STAR report.

Cheers,
Michael

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.3 years ago by michael.ante ★ 3.9k

1

Entering edit mode

So what you're saying is that STAR counts each multiply-mapped read once, but HTSeq counts them as many times as they're mapped? That makes sense. Thank you!