I subset a BAM file by some specific inter-chromosomal regions (e.g. samtools view chrXYZ
) resulting in the loss of some mates when the read pairs overlapped the edges of these regions. I want to remove any remaining singleton reads after this but the obvious the standard method based on the sam flags doesn't work.
I can identify the culprits using:
java -Xmx10g -jar picard.jar ValidateSamFile I=in.bam O=out.log
But I can't seem to find a tool that can remove them automatically? Was hoping there was a parameter somewhere in a picard/samtool tool but wasn't able to find it (e.g. IGNORE_MISSING_MATES=false). Rather than having to extract the read ids and doing it manually (these are huge BAMs).
Anyone have any suggestions? Thanks.
With BBTools
actually this doesn't work... it still seems to do this based on the SAM flags. i'm trying repair.sh but it is extremely slow
Thanks for the observation.
How are you using
repair.sh
? If you know the read id's of the orphan reads perhaps it would be easier to filter the BAM file to remove those lines?