I have some bam files that I want to split. The reference genome has 22 chromosomes and a bunch of unplaced scaffolds. I can easily split the bam files chromosome by chromososme
samtools view my.bam chromsosmeI -b -o my_chromosomeI.bam
I can do the same scaffold by scaffold, but I would like to do extract all scaffolds with some sort of wild card (there are a lot of them) and have them all in the same bam output. The wild card part is the one I'm having issues.
how about using the option '-L' of
samtools view
?I'd try to extract the chromosomes and scaffolds by
samtools view -H
and create a simple bed-file out of it. Your scaffold, you'd like to group store in separate files which are then used as e.g.samtools view my.bam -L scffod_group1.bed - b -o my_scaffold_group1.bam
.Additionally, I'd add the
-h
parameter in the samtools view calls to include the header.