htseq-count overcounting non-unique alignments
1
3
Entering edit mode
9.2 years ago
jc.szamosi ▴ 50

Hi,

I have some reads that I aligned with STAR aligner version 2.4.2a and then ran through htseq-count version 0.6.1.

I'm noticing a discrepancy between what STAR reports as "Number of reads mapped to multiple loci" (7.1M, ~11% of reads) and what htseq-count reports as __alignment_not_unique (22.3M, ~31% of reads). This is a pretty big discrepancy. I ran the same thing on multiple files, both single-end and paired-end, and while the size of the discrepancy varies, it looks like it's always there.

I was wondering if anyone could help me figure out what's causing this. Shouldn't HTSeq be taking that count directly from the bam file?

Thanks,

Jake

htseq-count RNA-Seq STAR • 6.3k views
ADD COMMENT
0
Entering edit mode

What about "% of reads mapped to too many loci"? This is different from "% of reads mapped to multiple loci".

ADD REPLY
4
Entering edit mode
9.2 years ago
michael.ante ★ 3.9k

Hi Jake,

The __alignment_not_unique is rather an alignment count than an read count. If you sum up the gene counts, no features, and ambiguous counts, you'll get very very close to your uniquely mapping reads form the STAR report.

Cheers,
Michael

ADD COMMENT
1
Entering edit mode

So what you're saying is that STAR counts each multiply-mapped read once, but HTSeq counts them as many times as they're mapped? That makes sense. Thank you!

ADD REPLY
0
Entering edit mode

Exactly, that's my experience.

ADD REPLY

Login before adding your answer.

Traffic: 2007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6