Question

bedtools, merge function: avoid merging intervals if separated by a single base

0

Entering edit mode

5.9 years ago

gabri ▴ 60

Hi All,

I'm using bedtools v2.26.0 to combine overlapping intervals of a bed file into “merged” intervals. I have a problem with some SNPs features (same start and end coord). These are my bed file and command line:

input.bed

chr1  70833  70833  a
chr1  70837  70837  b
chr1  70839  70839  c
chr1  71001  71001  d

$ bedtools merge -i input.bed -c 4 -o collapse > output.bed

output.bed

chr1  70833  70833  a
chr1  70836  70840  b,c
chr1  71001  71001  d

By default, overlapping and/or "book-ended" features are combined.

For my analysis, I need to be very accurate. So, I only want to merge the truly overlapping features. I need the features to remain separated if they are separated by one or two bases. So, in this case, the output should remain the same as the input because there aren't any overlapping intervals:

output.bed

chr1  70833  70833  a
chr1  70837  70837  b
chr1  70839  70839  c
chr1  71001  71001  d

Is there a way to obtain this kind of sensitivity with bedtools?

Thanks

bedtools merge single_base_separation • 5.5k views

ADD COMMENT • link updated 4.8 years ago by Ekalavya ▴ 10 • written 5.9 years ago by gabri ▴ 60

1

Entering edit mode

Hello gabri ,

the output of bedtools is interesting. I'm not sure whether this a bug or by design.

Nevertheless I think your bed doesn't represent the positions you think. bed uses 0-based, half open intervals. That means it starts counting the position with 0 instead of 1. And the end position, given in the third column, isn't included. Saying this all your given intervals include no bases.

I guess your bed file should look like this:

chr1  70832  70833  a
chr1  70836  70837  b
chr1  70838  70839  c
chr1  71000  71001  d

fin swimmer

ADD REPLY • link 5.9 years ago by finswimmer 16k

1

Entering edit mode

Thank you ATpoint. It worked for me!

ADD REPLY • link 4.8 years ago by Ekalavya ▴ 10

0

Entering edit mode

Hi gabri, did you find any workaround to this problem? I am in the same situation. I want to consolidate coordinates of all the Cs in my RRBS dataset. For this, I pooled (all replicates and groups) the CX-report from bismark and used bedtools merge to get unique locations. My final bed file contains regions with length > 1! I found these are repeats of C. I tried with both 0- and 1-based coordinates.

(...truncated by ATpoint to avoid overly long post, please check my answer towards the -d option in bedtools merge)

ADD REPLY • link updated 4.8 years ago by ATpoint 85k • written 4.8 years ago by Ekalavya ▴ 10

score 2 · Answer 1 · 2020-01-31

2

Entering edit mode

4.8 years ago

ATpoint 85k

Check the -d option of bedtools merge. If you provide negative integers you can set a minimum number of bases that must overlap before a merge which should be what you need. In case of the toplevel question -d -1 should do the trick.

ADD COMMENT • link 4.8 years ago by ATpoint 85k