I mapped pair end RNAseq read to genome using HISAT2 and got statistics like this:
31026735 (100.00%) were paired; of these:
3579612 (11.54%) aligned concordantly 0 times
25541051 (82.32%) aligned concordantly exactly 1 time
1906072 (6.14%) aligned concordantly >1 times
----
3579612 pairs aligned concordantly 0 times; of these:
290787 (8.12%) aligned discordantly 1 time
----
3288825 pairs aligned 0 times concordantly or discordantly; of these:
6577650 mates make up the pairs; of these:
6068493 (92.26%) aligned 0 times
475539 (7.23%) aligned exactly 1 time
33618 (0.51%) aligned >1 times
90.22% overall alignment rate
Now how I can select uniquely and concordantly reads for raw read count using samtools?
Thanks Macspider! I am also interested to use HTseq for read count which only count uniquely mapped read. Could anyone suggest how to proceed with hisat2 sam file to do read count using HTseq?
Documentation is your friend! https://htseq.readthedocs.io/en/release_0.9.1/tour.html
This is very wrong. You did not understand the meaning of the option ZS. It only reports the alignment score of the secondary alignment which can be less than the primary one. As long as the primary one has a higher score, that alignment is considered uniquely mapped. The only way is to comare AS and ZS together; if they are equal it means that the alignment is actually a multimapper.