Entering edit mode
3.0 years ago
Friederike
9.0k
What tool would you recommend to compare two BAM files and extract matching reads by read ID?
What tool would you recommend to compare two BAM files and extract matching reads by read ID?
samtools view file1.bam | awk -F "\t" '{print $1}' | sort | uniq > names_in_file1
filterbyname.sh -Xmx4g in=file2.bam names=names_in_file1 out=file.fq.gz include=t
file.fq.gz
will include reads that are common in both files.
There is this: https://genome.sph.umich.edu/wiki/BamUtil:_diff
@Pierre also seems to have tool for this: Comparison between .bam files
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Without extracting read names, doing the comparison outside the BAM?
filterbyname.sh
from BBMap would be an option. You can come up with a clever way of using pipes/process redirection. May post an example later.Mostly looking for performance-savvy solutions (and general inspiration if there's not a specific tool that would do it)
duplicate:
Extract the alignments from a Bam file by name of the read
Efficiently Extracting Reads With Specific Names ('Queryname') From .Bam File
How To Extract A Subset Of Reads In Fastq Using An Id List?
....
well, to be fair, I was mostly searching for a clever way to actually compare two BAM files directly, but it seems I'll have to go via extracting the read names first and then use those for subsetting (which is well covered in those posts)