Question

Produce Bed File With Gaps In Coverage From A Bam File

0

Entering edit mode

11.7 years ago

14134125465346445 ★ 3.6k

What options are there to produce a bed file with every entry being a start-end segment of the reference with a gap in coverage from a bam file?

bam • 6.3k views

ADD COMMENT • link updated 22 months ago by Ram 45k • written 11.7 years ago by 14134125465346445 ★ 3.6k

Ram · Accepted Answer · 2013-12-16

2

Entering edit mode

11.3 years ago

Aaronquinlan 12k

If you create a BEDGRAPH of coverage from your BAM file using the bedtools genomecov tool, you can simply filter for regions with 0 (or whatever minimum value) as follows:

bedtools genomecov -ibam my.sorted.bam -bg > my.sorted.bedgraph
awk '$4 < 1' my.sorted.bedgraph > my.uncovered.intevals.bedgraph

ADD COMMENT • link updated 5.3 years ago by Ram 45k • written 11.3 years ago by Aaronquinlan 12k

0

Entering edit mode

that's exactly what we do to detect poorly covered regions (and it works quite fast I must add). you may want to intersect that final bedgraph with your design bed file to get only the designed regions that were not covered enough.

ADD REPLY • link 11.3 years ago by Jorge Amigo 14k

score 1 · Accepted Answer · 2013-07-17

1

Entering edit mode

11.7 years ago

Matt LaFave ▴ 310

You could try bedtools, though you'd need to do a little intermediate processing. Use bedtools bamtobed to convert the bam file into bed format, then use bedtools complement to identify the intervals that weren't covered in the original bam file.

ADD COMMENT • link 11.7 years ago by Matt LaFave ▴ 310

Ram · Accepted Answer · 2013-07-17

First, to demonstrate the idea, make a sorted reference BED file. Let's say you're working with human data (hg19 build):

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from hg19.chromInfo" \
    | awk '{print $1"\t0\t"($2+1)}' \
    | tail -n +2 \
    | sort-bed - \
    > hg19.ref.bed

$ head hg19.ref.bed
chr1     0    249250622
chr10    0    135534748
chr11    0    135006517
...

(You can use any sorted reference coordinates you want — I am using sorted coordinates for the hg19 reference genome for demonstration purposes.)

Next, convert your BAM file to BED:

$ bam2bed < myReads.bam > myReads.bed

Finally, run the reads and reference files through BEDOPS bedops --partition and bedops --not-element-of to extract regions-of-interest:

$ bedops --partition hg19.ref.bed myReads.bed \
    | bedops --not-element-of -1 - myReads.bed \
    > answer.bed

The file answer.bed will be a sorted BED file that contains regions from the reference file hg19.ref.bed, with gaps where the BAM-sourced reads are located.

(Briefly, the way this works is that the --partition operation splits the reference and read regions into disjoint subsets. The --not-element-of operation filters out disjoint subsets that are specifically from the reads set.)