Entering edit mode
6.6 years ago
Vasu
▴
790
Hi,
I'm using hisat2 for aligning reads to the genome. For a few samples I see some differences by using hisat2 and bamqc from qualimap.
Hisat2 output:
37317546 reads; of these:
37317546 (100.00%) were paired; of these:
14771091 (39.58%) aligned concordantly 0 times
7081700 (18.98%) aligned concordantly exactly 1 time
15464755 (41.44%) aligned concordantly >1 times
----
14771091 pairs aligned concordantly 0 times; of these:
1186424 (8.03%) aligned discordantly 1 time
----
13584667 pairs aligned 0 times concordantly or discordantly; of these:
27169334 mates make up the pairs; of these:
22785681 (83.87%) aligned 0 times
1973892 (7.27%) aligned exactly 1 time
2409761 (8.87%) aligned >1 times
69.47% overall alignment rate
For the same sample using bam file "qualimap bamqc results" are as following:
Reference
number of bases = 3,099,750,718 bp
number of contigs = 194
Globals
number of windows = 593
number of reads = 202,671,876
number of mapped reads = 179,886,195 (88.76%)
number of mapped paired reads (first in pair) = 90,666,939
number of mapped paired reads (second in pair) = 89,219,256
number of mapped paired reads (both in pair) = 171,622,685
number of mapped paired reads (singletons) = 8,263,510
number of mapped bases = 30,000,606,541 bp
number of sequenced bases = 8,238,989,876 bp
number of aligned bases = 0 bp
number of duplicated reads (estimated) = 95,761,476
duplication rate = 25.6%
Insert size
mean insert size = 29,714.41
std insert size = 464,081.65
median insert size = 1199
Mapping quality
mean mapping quality = 13.82
ACTG content
number of A's = 1,679,640,440 bp (20.39%)
number of C's = 2,133,982,067 bp (25.9%)
number of T's = 1,805,802,126 bp (21.92%)
number of G's = 2,619,565,243 bp (31.79%)
number of N's = 0 bp (0%)
GC percentage = 57.7%
Mismatches and indels
general error rate = 0
number of mismatches = 32,158,659
number of insertions = 876,201
mapped reads with insertion percentage = 0.49%
number of deletions = 174,885
mapped reads with deletion percentage = 0.1%
homopolymer indels = 24.62%
In hisat2 output I see overall alignment rate is 69.47% and bamqc results I see number of mapped reads is 88%. Which is right one?
Both metrics are right. In your bam file, you have 88% of mapped reads. From your input reads, only 69% are mapped (once or more than once). The mutlitple alignments are causing the difference
Ok. And how can I get unmapped reads percentage? 88% of mapped reads is once?
The percentage of unmapped reads (compared to the total number of reads) is 30.53%.
The percentage of unmapped reads (compared to the total number of alignments in the bam file) is 12%.
Thank you. But could you please tell me how this total number of reads and total number of alignments are different? And could you also tell me how u calculated the above percentages.
Because for one read, there can be more than one alignment :
(41.44%) aligned concordantly >1 times
100% - 69.47% = 30.53% (1 - (number of reads that map at least once/total number of reads) = proportion of unmapped reads)
100% - 88% = 12% (1 - (number of effective alignments/total number of entries in the bam file) = proportion of unmapped reads in the bam file)
Thank you very much. I guess there is a typo in ur comment. It should be 100% - 88% = 12%.