I have a Bam file for which I want to REMOVE reads overlapping with a given Bed file (so keep all other reads/alignments). I also want to make sure reads and their mates are always kept.
I am using the following code to filter out a given bed file from bam.
samtools view -L in.bed -O BAM --unoutput output.bam input.bam > /dev/null
After I obtained the output.bam. I sorted the output.bam and convert to fastq (example below). However 663286 singletons were discarded. Is there any way to make sure the mates or reads are always kept?
samtools sort -n -m 5G -@ 2 output.bam -o output.sorted.bam
samtools fastq -@ 8 output.sorted.bam \
-1 output_R1.fastq.gz \
-2 output_R2.fastq.gz \
-0 /dev/null -s /dev/null -n
I tried samtools view
with --fetch-pairs
, but the documentation says "Note that this option does not work with the -c, --count; -U, --output-unselected; or -p, --unmap options." I am not sure if this also includes --unoutput
samtools view -L in.bed -O BAM --unoutput output.bam --fetch-pairs input.bam > /dev/null
I am not sure if I am missing something.
samtools view documentation: http://www.htslib.org/doc/samtools-view.html
Have you tried using
bedtools intersect
with-v
option?I am not sure about the behaviour of bedtools when one mate falls within the interval while the other does not though. One option could be to extract the readnames of the reads whose mate is missing, and use this readname to "grep" the other mate that did not pass the filter from the bam file.
Thank you for the suggestion!
get the complement of your bed with bedtools and follow : Extract reads within given region, and their mates ; How to extract reads in a specific region from a BAM file and also their mates mapped elsewhere ? ; Filtering out of both read pairs from BAM file if any of the two align within interval ; Extract paired-end reads with one end falls within a given region from a bam file
Will the complement work if the original Bam file is mapped to a combined reference genome? So the Bam file is mapped to a parasite + human. I want to filter out only human reads using a human.bed.
As far as I understand that's a XY problem. https://xyproblem.info/ There is no BED here, but chromosome names. As far as I understand, You want a modified version of Extracting chimeric reads from mapping
Thank you fro the information!