Entering edit mode
8.4 years ago
scchess
▴
640
I want to subsample my alignment file, but to only a single chromosome, say chr21. Alignments to any other chromosome must stay intact. I'm trying to come up with the easiest way:
- Sort and index the alignment file (eg: sorted.bam)
- Get alignments for chr21 and save it to a new BAM file (eg: chr21.bam)
- Subsample the new BAM file (sampled.bam)
- Get alignments from the original alignment file, write out everything but chr21. Save it to a new BAM file (eg: no_chr21.bam)
- Merge no_chr21.bam with sampled.bam
Is there anything better than my methods? My method is slow, and take up unnecessary disk space.
Original post makes it sound like @student-t wants to sub-sample an entire chromosome. Would your solution (assuming fraction=1) be faster then samtools view region?
This method doesn't require a sorted BAM file, which might be a benefit (though it'll totally screw up read pairing, if that's important). This will end up being slightly slower than
samtools view
due to the python overhead, but aside from that it'll be largely equivalent.BTW, any chromosome that's not
chr21
will be used in its entirety, which I assumed OP wanted (if not, I think one can do that directly in samtools).