I am confused about the alignment stats I am getting and I really hope someone can explain them to me!
So I've used HISAT2 with default parameters using the grch38_tra index available. The results that HISAT2 is reporting back to me look fine to me. See below for an example, where I have an alignment rate of ~ 83 % :
5389593 (28.98%) aligned concordantly 0 times
11974983 (64.39%) aligned concordantly exactly 1 time
1233844 (6.63%) aligned concordantly >1 times
----
5389593 pairs aligned concordantly 0 times; of these:
1021332 (18.95%) aligned discordantly 1 time
----
4368261 pairs aligned 0 times concordantly or discordantly; of these:
8736522 mates make up the pairs; of these:
6676246 (76.42%) aligned 0 times
1714031 (19.62%) aligned exactly 1 time
346245 (3.96%) aligned >1 times
This makes sense to me but when I look at the qualimap results I am confused:
Number of mapped reads (left/right): 15,693,967 / 14,826,627
Number of aligned pairs (without duplicates): 13,208,827
Total number of alignments: 42,737,940
Number of secondary alignments: 12,217,346
Number of non-unique alignments: 15,018,799
Aligned to genes: 10,778,652
Ambiguous alignments: 1,313,140
No feature assigned: 15,611,447
Missing chromosome in annotation: 15,902
Not aligned: 6,676,246
Strand specificity estimation (fwd/rev): 0.03 / 0.97
So, what really threw me was the Total number of alignments: 42,737,940
15,693,967 + 14,826,627 = 30,520,594
reads
this matches the HISAT2 results: 11974983*2 + 1233844*2 + 1021332*2 + 1714031 + 346245 = 30,520,594
reads
42,737,940 - 30,520,594 = 12,217,346
secondary alignments - this seems a lot and now I am worried something has gone wrong...
But HISAT2 says 1233844 (6.63%) aligned concordantly >1 times
and 346245 (3.96%) aligned >1 times
- this doesn't seem so bad.
How does this go together? Does this mean that a small number of reads map very often? As far as I know, HISAT2 allows a maximum of k=5 distinct alignments in default mode. Does it mean that most of the 1233844*2 + 346245
map around 5 times (and possibly more often if I would have allowed for a higher k)?
Is this how the Number of secondary alignments
and the Number of non-unique alignments
relate to each other?
Number of non-unique alignments
would then be secondary alignments plus the number of multi mappers set as primary:
1233844*2 + 346245 + 12,217,346
which is close to Number of non-unique alignments: 15,018,799
Is this something to worry about? I see this with most of my samples. What do you use as cutoff/threshold for multi mappings as a quality control for your sample? Thanks for your input!
Explanation about HISAT stats could be found here
A: Evaluation of HISAT2 Alignment Result
Thanks for the link - I understand the HISAT2 results - my question was more regarding the Qualimap results and if the
Total number of alignments / Number of secondary alignments
is too high. Having said that I also get similar results for human Encode samples.How long are your reads?
The reads are 100bp long and paired-end