Entering edit mode
20 months ago
Jalil Sharif
▴
80
Hello,
I have 7.5million 25mer artificial reads, I mapped them against a reference genome using bowtie2.
The following flag was used bowtie2 -k 2 --very-sensitive
7558491 reads; of these:
7558491 (100.00%) were unpaired; of these:
1399350 (18.51%) aligned 0 times
5200484 (68.80%) aligned exactly 1 time
958657 (12.68%) aligned >1 times
81.49% overall alignment rate
The are the results, I want to remove all the unmapped and multiple aligned reads. I understand, I have to take into account mapping quality, but as these are artificial reads, would I still use:
samtools view -F 4 -q 2 test.bam | wc -l
samtools
does not know or care that the reads were artificial. It is going to carry out the operation you are asking it to do.Hopefully you controlled for that already in your
bowtie
command.That is obvious, but does not address the fact, of what the most appropriate way is to remove reads that map to multiple locations.
Mapping qualities are not handled the same way by different aligners. So with
bowtie2
you could use aMAPQ filter of >=40 to get reads which had only 1 convincing alignment
as noted in the linked blog post.I ran
samtools view -F 4 -q 42 test.bam | wc -l
and I get6159141
, why is there a discrepancy, as originally only5200484
aligned?When I did the calculations
6159141 - 5200484 = 958657
, so I am still retaining the958657
multi-mapped reads.