How to subsample reads for a chromosome in a BAM file with samtools?
1
0
Entering edit mode
8.4 years ago
scchess ▴ 640

I want to subsample my alignment file, but to only a single chromosome, say chr21. Alignments to any other chromosome must stay intact. I'm trying to come up with the easiest way:

  1. Sort and index the alignment file (eg: sorted.bam)
  2. Get alignments for chr21 and save it to a new BAM file (eg: chr21.bam)
  3. Subsample the new BAM file (sampled.bam)
  4. Get alignments from the original alignment file, write out everything but chr21. Save it to a new BAM file (eg: no_chr21.bam)
  5. Merge no_chr21.bam with sampled.bam

Is there anything better than my methods? My method is slow, and take up unnecessary disk space.

samtools BAM • 5.1k views
ADD COMMENT
3
Entering edit mode
8.4 years ago

Have no fear, python and pysam are here:

#!/usr/bin/env python
import pysam
import random

bam = pysam.AlignmentFile("some alignment file.bam") # Change me
output = pysam.AlignmentFile("some other alignment file.bam", "wb", template=bam) # Change me
subsample_chrom = "chr21" # Change me if needed
fraction = 0.2 # Change me

for read in bam.fetch():
    if read.reference_name != subsample_chrom:
        output.write(read)
    else if random.random() < fraction:
        output.write(read)
bam.close()
output.close()

This will subsample ~20% of chr21. You can alter it from there to do whatever you need.

I should note that this is untested, so you might have to fix a typo or two.

ADD COMMENT
0
Entering edit mode

Original post makes it sound like @student-t wants to sub-sample an entire chromosome. Would your solution (assuming fraction=1) be faster then samtools view region?

ADD REPLY
0
Entering edit mode

This method doesn't require a sorted BAM file, which might be a benefit (though it'll totally screw up read pairing, if that's important). This will end up being slightly slower than samtools view due to the python overhead, but aside from that it'll be largely equivalent.

BTW, any chromosome that's not chr21 will be used in its entirety, which I assumed OP wanted (if not, I think one can do that directly in samtools).

ADD REPLY

Login before adding your answer.

Traffic: 2119 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6