htseqcount missing mates
0
0
Entering edit mode
4.2 years ago
vujex ▴ 10

Hello

I'm having a hard time understanding the behaviour of HTSeqcount.

I first sorted my bam files:

samtools sort sample1.bam > sample1_sorted.bam

Then I ran: htseq-count -f bam -m intersection-strict -s no -a 10 -t gene -i gene_id sample1_sorted.bam file.gff3 > sample1_count.txt

And I get:

Warning: 27293632 reads with missing mate encountered. 27159053 BAM alignment pairs processed

So I tried adding '-r pos' (which I found as solution on another post): htseq-count -f bam -m intersection-strict -s no -a 10 -t gene -i gene_id - r pos sample1_sorted.bam file.gff3 > sample1-count.txt

Now, I don't get this warning message anymore but:

Warning: Mate records missing for 10599 records, 9953038 BAM alignment pairs processed.

I guess this is much better, however when I compare the actual output, it seems like the first command worked better.

Output htseq without -r pos: geneA 0 geneB 5 geneC 16 geneD 1 geneE 16

Output htseq with -r pos : geneA 0 geneB 3 geneC 7 geneD 1 geneE 12

I'm wondering if anyone has any suggestion/idea what output is more reliable and/or any other options I could try? Thanks!

htseqcount samtools • 1.5k views
ADD COMMENT
0
Entering edit mode

small note: it's not because you get higher counts for genes that the result is 'better'.

correctly including the mate info will result in more precise mapping and might thus reduce the counts but they will be more reliable (which does not imply that your second approach is correct here)

ADD REPLY

Login before adding your answer.

Traffic: 1474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6