Artificial reads - remove multiple mapped reads against reference genome, and only keep reads that completely match without any mismatches - samtools
0
0
Entering edit mode
20 months ago
Jalil Sharif ▴ 80

Hello,

I have 7.5million 25mer artificial reads, I mapped them against a reference genome using bowtie2.

The following flag was used bowtie2 -k 2 --very-sensitive

7558491 reads; of these:
  7558491 (100.00%) were unpaired; of these:
    1399350 (18.51%) aligned 0 times
    5200484 (68.80%) aligned exactly 1 time
    958657 (12.68%) aligned >1 times
81.49% overall alignment rate

The are the results, I want to remove all the unmapped and multiple aligned reads. I understand, I have to take into account mapping quality, but as these are artificial reads, would I still use:

samtools view -F 4 -q 2 test.bam | wc -l

samtools • 1.1k views
ADD COMMENT
0
Entering edit mode

samtools does not know or care that the reads were artificial. It is going to carry out the operation you are asking it to do.

only keep reads that completely match without any mismatches

Hopefully you controlled for that already in your bowtie command.

ADD REPLY
0
Entering edit mode

That is obvious, but does not address the fact, of what the most appropriate way is to remove reads that map to multiple locations.

ADD REPLY
2
Entering edit mode

Mapping qualities are not handled the same way by different aligners. So with bowtie2 you could use a MAPQ filter of >=40 to get reads which had only 1 convincing alignment as noted in the linked blog post.

ADD REPLY
0
Entering edit mode

I ran samtools view -F 4 -q 42 test.bam | wc -l and I get 6159141, why is there a discrepancy, as originally only 5200484 aligned?

When I did the calculations 6159141 - 5200484 = 958657, so I am still retaining the 958657 multi-mapped reads.

ADD REPLY

Login before adding your answer.

Traffic: 2728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6