Question

Filter Bam File Based On Coverage

2

Entering edit mode

12.7 years ago

Abhi ★ 1.6k

Hi Guys

Is there a tool out there that will filter read-pairs if any pair maps to a location with coverage < N.

I can in theory run the samtools pileup find the positions at which coverage is < N and read the bam file again and exclude the reads that are mapped to positions in my low coverage locations per chromosome.

This can be memory intensive in some cases where coverage is low and I am not sure if I find a read in low coverage region how do I make sure to exclude its mate.

Thanks! -Abhi

bam samtools sam • 9.6k views

ADD COMMENT • link updated 12.7 years ago by Jorge Amigo 14k • written 12.7 years ago by Abhi ★ 1.6k

0

Entering edit mode

you could use samtools depth tool and then create a bed from the output. This bed could be used to filter your bam. There must be an easier way?

ADD REPLY • link 12.7 years ago by Zev.Kronenberg 12k

0

Entering edit mode

What is the workflow that requires this manipulation--just curious?

ADD REPLY • link 12.7 years ago by Sean Davis 27k

0

Entering edit mode

just want to get rid of any mapping artifact and noise. We expect regions of interest to have high coverage. This is actually the same transcript start,end data points and I would like to remove reads in a region with < N coverage. May be I should also think about physical coverage.

ADD REPLY • link 12.6 years ago by Abhi ★ 1.6k

0

Entering edit mode

Abhi, How do yo get read a bam file and exclude the reads that are mapped to positions with low coverage?

ADD REPLY • link 7.2 years ago by cobalym7 • 0

0

Entering edit mode

first, you have to detect those low coverage regions. then, you have to filter your bam file with those regions. I would suggest you open a new question rather than commenting a previous related one.

ADD REPLY • link 7.2 years ago by Jorge Amigo 14k

score 2 · Answer 1 · 2012-04-10

I think that the idea behind this question is to generate a reduced BAM file based on coverage, leaving only the reads and regions which would actually be useful for any kind of downstream analysis Abhi may want to perform. I can only foresee 2 major set of applications for this idea, which in my opinion shouldn't be addressed this way: a) you want simplify a later coverage calculation, or b) you want to work only on regions with a coverage above certain threshold. in any of these cases, the usual proceeding is to deal directly with the entire BAM, setting filters/thresholds for the analysis to be performed on them. the tools that were designed to deal with BAM files are indeed optimized to perform these filters/thresholds when needed.

I could only understand performing such extra work if your intention is to perform a later intensive work on that BAM file, which just being significantly reduced on size would represent an interesting save of disk usage, hence you'll get reduced timings. if you still want to go for such filtering process, the easiest thing I can think of would be a first pass trying to generate a bed file with the regions of coverage above the desired threshold (bedtools' coverageBed should do the work, and it'll also be very fast), and then a second pass filtering the BAM file with those regions (samtools should do the work, and again that should be very fast indeed).