So I think I've just read every single post about this on the internet.
I simply want to extract reads mapped to a certain region. This is the pipeline I'm currently using:
samtools view sorted.bam "chr3:1000-5000" | cut -f1 > ids
LC_ALL=C grep -wFf ids < original.sam > subset.sam # this is the bottleneck
echo -e "$(samtools view -H original.sam)\n$(cat subset.sam)" > subset.sam
samtools view -b subset.sam | samtools fastq -1 reads1.fq -2 reads2.fq -
This is not too bad for a single run, but I actually have to do this for 1000+ regions. Is there possibly a more efficient way to extract the reads, or is this as fast as it gets?
Edit: added updating-the-header step
Edit: Help
Could you clarify what the desired output is? Do you want fastq files, read IDs, a subset bam or something else?
The fastq files (paired end).