Filter out Bam not overlapping with Bed File (keeping a read and its mate)?
0
0
Entering edit mode
2.1 years ago
Eliveri ▴ 350

I have a Bam file for which I want to REMOVE reads overlapping with a given Bed file (so keep all other reads/alignments). I also want to make sure reads and their mates are always kept.

I am using the following code to filter out a given bed file from bam.

samtools view -L in.bed -O BAM --unoutput output.bam input.bam > /dev/null

After I obtained the output.bam. I sorted the output.bam and convert to fastq (example below). However 663286 singletons were discarded. Is there any way to make sure the mates or reads are always kept?

samtools sort -n -m 5G -@ 2 output.bam -o output.sorted.bam
samtools fastq -@ 8 output.sorted.bam \
  -1 output_R1.fastq.gz \
  -2 output_R2.fastq.gz \
  -0 /dev/null -s /dev/null -n

I tried samtools view with --fetch-pairs, but the documentation says "Note that this option does not work with the -c, --count; -U, --output-unselected; or -p, --unmap options." I am not sure if this also includes --unoutput

samtools view -L in.bed -O BAM --unoutput output.bam --fetch-pairs input.bam > /dev/null

I am not sure if I am missing something.

samtools view documentation: http://www.htslib.org/doc/samtools-view.html

samtools bam bed • 2.2k views
ADD COMMENT
1
Entering edit mode

Have you tried using bedtools intersect with -v option?

-v Only report those entries in A that have no overlap in B. Restricted by -f and -r.

bedtools intersect -a input.bam -b in.bed -v

I am not sure about the behaviour of bedtools when one mate falls within the interval while the other does not though. One option could be to extract the readnames of the reads whose mate is missing, and use this readname to "grep" the other mate that did not pass the filter from the bam file.

ADD REPLY
1
Entering edit mode

Thank you for the suggestion!

ADD REPLY
1
Entering edit mode

Will the complement work if the original Bam file is mapped to a combined reference genome? So the Bam file is mapped to a parasite + human. I want to filter out only human reads using a human.bed.

ADD REPLY
0
Entering edit mode

As far as I understand that's a XY problem. https://xyproblem.info/ There is no BED here, but chromosome names. As far as I understand, You want a modified version of Extracting chimeric reads from mapping

ADD REPLY
1
Entering edit mode

Thank you fro the information!

ADD REPLY

Login before adding your answer.

Traffic: 3701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6