Hi All,
I'm using bedtools v2.26.0 to combine overlapping intervals of a bed file into “merged” intervals. I have a problem with some SNPs features (same start and end coord). These are my bed file and command line:
input.bed
chr1 70833 70833 a
chr1 70837 70837 b
chr1 70839 70839 c
chr1 71001 71001 d
$ bedtools merge -i input.bed -c 4 -o collapse > output.bed
output.bed
chr1 70833 70833 a
chr1 70836 70840 b,c
chr1 71001 71001 d
By default, overlapping and/or "book-ended" features are combined.
For my analysis, I need to be very accurate. So, I only want to merge the truly overlapping features. I need the features to remain separated if they are separated by one or two bases. So, in this case, the output should remain the same as the input because there aren't any overlapping intervals:
output.bed
chr1 70833 70833 a
chr1 70837 70837 b
chr1 70839 70839 c
chr1 71001 71001 d
Is there a way to obtain this kind of sensitivity with bedtools?
Thanks
Hello gabri ,
the output of
bedtools
is interesting. I'm not sure whether this a bug or by design.Nevertheless I think your
bed
doesn't represent the positions you think.bed
uses 0-based, half open intervals. That means it starts counting the position with 0 instead of 1. And the end position, given in the third column, isn't included. Saying this all your given intervals include no bases.I guess your
bed
file should look like this:fin swimmer
Thank you ATpoint. It worked for me!
Hi gabri, did you find any workaround to this problem? I am in the same situation. I want to consolidate coordinates of all the Cs in my RRBS dataset. For this, I pooled (all replicates and groups) the CX-report from bismark and used bedtools merge to get unique locations. My final bed file contains regions with length > 1! I found these are repeats of C. I tried with both 0- and 1-based coordinates.
(...truncated by ATpoint to avoid overly long post, please check my answer towards the
-d
option inbedtools merge
)