I can extract reads mapped to a single reference sequence by first aligning the reads using, say, BWA, and processing with SAMtools:
samtools view -b -f 0x2 alignment.sam | samtools fastq - -1 mapped_1.fastq -2 mapped_2.fastq
Is there a similar pipeline I can use to extract reads mapped to multiple references? That is, if a read was found in any of the given references, it should be included in the output.
Here's an idea I have:
- align the reads to each reference separately (producing multiple SAM files)
- extract the reads mapped to each reference using the above pipeline
sort
anduniq
the names of the reads from all output files, storing the names in file.txtgrep
the names in file.txt from one of the SAM files and store in a new SAM file- convert SAM to FASTQ
I'm guessing this is probably not the most efficient way to solve the problem. It also doesn't catch when a read might be mapped to one of the references without its pair. In this case, I'd like to extract both reads.
Concatenating the references into one fasta file worked just fine. Not the first time I spend way more time thinking and typing than is necessary...