How to remove un-paired reads from BAM after filtering?
1
0
Entering edit mode
22 months ago

I subset a BAM file by some specific inter-chromosomal regions (e.g. samtools view chrXYZ) resulting in the loss of some mates when the read pairs overlapped the edges of these regions. I want to remove any remaining singleton reads after this but the obvious the standard method based on the sam flags doesn't work.

I can identify the culprits using:

java -Xmx10g -jar picard.jar ValidateSamFile I=in.bam O=out.log

But I can't seem to find a tool that can remove them automatically? Was hoping there was a parameter somewhere in a picard/samtool tool but wasn't able to find it (e.g. IGNORE_MISSING_MATES=false). Rather than having to extract the read ids and doing it manually (these are huge BAMs).

Anyone have any suggestions? Thanks.

picard samtools • 1.8k views
ADD COMMENT
1
Entering edit mode

With BBTools

reformat.sh -Xmx4g in=your.bam out=filtered.bam pairedonly=t
ADD REPLY
0
Entering edit mode

actually this doesn't work... it still seems to do this based on the SAM flags. i'm trying repair.sh but it is extremely slow

ADD REPLY
0
Entering edit mode

Thanks for the observation.

How are you using repair.sh? If you know the read id's of the orphan reads perhaps it would be easier to filter the BAM file to remove those lines?

ADD REPLY
2
Entering edit mode
22 months ago

Since most tools use the SAM flags to identify unpaired reads (whereas here this is not the case as the reads are removed manually)

I decided to do the following:

  • sort BAM file by read name
  • keep only entries where read names occur consecutively and are identical

Here is the command that sorts, filters, converts/re-sorts (I might be using an older version of samtools):

samtools sort -n -O BAM -o input.bam -@ 5 input_sorted.bam; samtools view -h input_sorted.bam | awk -f script.awk - > final.sam; samtools sort -@ 5 -O BAM -o final.bam final.sam; rm final.sam; samtools index -@ 5 final.bam

For the script.awk, most of it was written by ChatGPT... :

#!/usr/bin/awk -f
BEGIN {
  prev = ""
  prevline = ""
}

{
    if ($0 ~ "^@") {
    print $0
    next
    }
    if ($1 == prev) {
    print prevline
    print $0
    } else {
    prev = $1
    prevline = $0
    }
}
ADD COMMENT
0
Entering edit mode

NOTE: This requires that the read names for paired mates are actually identical (e.g. no \1 \2 etc...)

ADD REPLY

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6