Filter Bam File Based On Coverage
1
2
Entering edit mode
12.7 years ago
Abhi ★ 1.6k

Hi Guys

Is there a tool out there that will filter read-pairs if any pair maps to a location with coverage < N.

I can in theory run the samtools pileup find the positions at which coverage is < N and read the bam file again and exclude the reads that are mapped to positions in my low coverage locations per chromosome.

This can be memory intensive in some cases where coverage is low and I am not sure if I find a read in low coverage region how do I make sure to exclude its mate.

Thanks! -Abhi

bam samtools sam • 9.6k views
ADD COMMENT
0
Entering edit mode

you could use samtools depth tool and then create a bed from the output. This bed could be used to filter your bam. There must be an easier way?

ADD REPLY
0
Entering edit mode

What is the workflow that requires this manipulation--just curious?

ADD REPLY
0
Entering edit mode

just want to get rid of any mapping artifact and noise. We expect regions of interest to have high coverage. This is actually the same transcript start,end data points and I would like to remove reads in a region with < N coverage. May be I should also think about physical coverage.

ADD REPLY
0
Entering edit mode

Abhi, How do yo get read a bam file and exclude the reads that are mapped to positions with low coverage?

ADD REPLY
0
Entering edit mode

first, you have to detect those low coverage regions. then, you have to filter your bam file with those regions. I would suggest you open a new question rather than commenting a previous related one.

ADD REPLY
2
Entering edit mode
12.6 years ago

I think that the idea behind this question is to generate a reduced BAM file based on coverage, leaving only the reads and regions which would actually be useful for any kind of downstream analysis Abhi may want to perform. I can only foresee 2 major set of applications for this idea, which in my opinion shouldn't be addressed this way: a) you want simplify a later coverage calculation, or b) you want to work only on regions with a coverage above certain threshold. in any of these cases, the usual proceeding is to deal directly with the entire BAM, setting filters/thresholds for the analysis to be performed on them. the tools that were designed to deal with BAM files are indeed optimized to perform these filters/thresholds when needed.

I could only understand performing such extra work if your intention is to perform a later intensive work on that BAM file, which just being significantly reduced on size would represent an interesting save of disk usage, hence you'll get reduced timings. if you still want to go for such filtering process, the easiest thing I can think of would be a first pass trying to generate a bed file with the regions of coverage above the desired threshold (bedtools' coverageBed should do the work, and it'll also be very fast), and then a second pass filtering the BAM file with those regions (samtools should do the work, and again that should be very fast indeed).

ADD COMMENT

Login before adding your answer.

Traffic: 1708 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6