Remove reads from bam file whose mate has been already filtered out
2
0
Entering edit mode
9.5 years ago
iraun 6.2k

Hi all,

I know this is a basic question, or in my mind sounds "basic", but I can not figure out the solution. I have a bam file that has been already filtered according to the quality:

samtools view -@ 6 -b -q 30 raw.bam > filtered.bam

I would like to remove from filtered.bam, all the reads whose mate is not present (because it has been removed during quality filter).

Is there any tool to perfom this? Or I should create a script, search for each read ID and look if the mate is present? Also, it could be nice to do this while doing the quality filter, but I don't think that its possible.

Any suggestion?

EDIT: I think that I could use something like this:

awk '{if (x[$1]) { x_count[$1]++; print $0; if (x_count[$1] == 1) { print x[$1] } } x[$1] = $0}' filtered.bam
bam samtools • 3.0k views
ADD COMMENT
0
Entering edit mode

Your awk script will not work because the information (forward/reverse' is in the sam-flag (not in the name) and because you cannot put a large bam in memory.

ADD REPLY
0
Entering edit mode

Thanks for pointing out the memory problem, hopefully my specific bam is quite small. I guess I could also check the sequence field and see if the sequence of "two" mates is equal or not, and if it is, discard them.

ADD REPLY
0
Entering edit mode

You may want to check whether the aligner you're using will even give mates different MAPQs if they're not mapped as singletons (many won't). Your aligner might also append the MQ auxiliary tag. These would make the task much faster and let you avoid any sorting.

ADD REPLY
3
Entering edit mode
9.5 years ago

I would

  • sort by name using samtools sort
  • use awk to grab consecutive reads having the same name
  • only print the reads where R1 and R2 have been found
ADD COMMENT
0
Entering edit mode
9.5 years ago
khikho ▴ 100

Use the concept priority queue if you want to implement it by yourself otherwise use bam-fixpair command of package biohazard .

https://github.com/udo-stenzel/biohazard

ADD COMMENT

Login before adding your answer.

Traffic: 1837 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6