Hello
I'm having a hard time understanding the behaviour of HTSeqcount.
I first sorted my bam files:
samtools sort sample1.bam > sample1_sorted.bam
Then I ran:
htseq-count -f bam -m intersection-strict -s no -a 10 -t gene -i gene_id sample1_sorted.bam file.gff3 > sample1_count.txt
And I get:
Warning: 27293632 reads with missing mate encountered. 27159053 BAM alignment pairs processed
So I tried adding '-r pos' (which I found as solution on another post):
htseq-count -f bam -m intersection-strict -s no -a 10 -t gene -i gene_id - r pos sample1_sorted.bam file.gff3 > sample1-count.txt
Now, I don't get this warning message anymore but:
Warning: Mate records missing for 10599 records, 9953038 BAM alignment pairs processed.
I guess this is much better, however when I compare the actual output, it seems like the first command worked better.
Output htseq without -r pos:
geneA 0
geneB 5
geneC 16
geneD 1
geneE 16
Output htseq with -r pos :
geneA 0
geneB 3
geneC 7
geneD 1
geneE 12
I'm wondering if anyone has any suggestion/idea what output is more reliable and/or any other options I could try? Thanks!
small note: it's not because you get higher counts for genes that the result is 'better'.
correctly including the mate info will result in more precise mapping and might thus reduce the counts but they will be more reliable (which does not imply that your second approach is correct here)