Question

Merge regions in bedtools genomecov/bedgraph file

0

Entering edit mode

3.9 years ago

eoneill627 ▴ 20

I want to calculate the number of reads mapping to regions of the genome, and I want to group these reads into bins of say 1kb. I have used bedtools genomecov to generate a bedgraph file that shows the coverage of reads across the genome.

bedtools genomecov -bg -ibam file.bam >Coverage.bedgraph

I then merged nearby regions of coverage using:

bedtools merge -d 1000 -c 4 -o sum -i Coverage.bedgraph > Coverage.merged.bedgraph

The problem is that the "sum" part adds the coverage from the different regions, so when merged, reads are counted more than once.

Is there a way to merge the regions without counting reads more than once? I think the necessary information from the bam file is probably lost at this point. Is there a way to do this with an option in bedtools genomecov i.e. can you make it report regions of coverage with a specified 'bin' size?

bedgraph genomecov • 2.4k views

ADD COMMENT • link 3.9 years ago by eoneill627 ▴ 20

1

Entering edit mode

I think deeptools bamCoverage has the option "--binSize", which seems like what you want. But I was still questioning if it can solve the "counting reads several times" problem since if a read span two bins, it will still be counted twice....

ADD REPLY • link 3.9 years ago by Jianyu ▴ 580

0

Entering edit mode

It may still count it twice, but I guess the larger the bin, the smaller this issue becomes. Thanks, I'll give this a go.

ADD REPLY • link 3.9 years ago by eoneill627 ▴ 20

0

Entering edit mode

This is great. The reads are occasionally still counted twice, but the bins are large enough that this isn't a big issue. Thanks!

ADD REPLY • link 3.9 years ago by eoneill627 ▴ 20

0

Entering edit mode

Just take any other operation than sum that you find appropriate. Mean, median, min, max, choice is yours.

ADD REPLY • link 3.9 years ago by ATpoint 88k

0

Entering edit mode

I'm not sure what those operations do. I couldn't really understand from the bedtools documentation.

ADD REPLY • link 3.9 years ago by eoneill627 ▴ 20

0

Entering edit mode

No magic here, min is the minimum, max the maximum, median the median and so on. So if you bin spans three intervals with coverage 3, 5 and 9 then median would report 5 for the bin, max 9 and min 3.

ADD REPLY • link 3.9 years ago by ATpoint 88k