I have BED files for 2 types of coords: A and B. I want to collapse them into intervals where an entry of A is separated from an entry of B by not more than 20, and vice versa. What is not acceptable is a merge of coordinates with 2 or more entries from just A or 2 or more entries from just B.
For example, let's say:
cat A.bed
Chr1 10 20 A1
Chr1 50 60 A2
Chr1 75 100 A3
and
cat B.bed
Chr1 25 40 B1
Chr1 115 160 B2
Now
cat A.bed B.bed > AB.bed
sortBed -i AB.bed > AB_sort.bed
The output from mergeBed with -d 20 gives what I do NOT want!
mergeBed -d 20 -i AB_sort.bed -c 4 -o collapse -delim ","
Chr1 10 160 A1,B1,A2,A3,B2
Results look different when I visually parse bedtools closest results seen below
bedtools closest -d -a A.bed -b B.bed
Chr1 10 20 A1 Chr1 25 40 B1 6
Chr1 50 60 A2 Chr1 25 40 B1 11
Chr1 75 100 A3 Chr1 115 160 B2 16
What combinations of bedtools's sub-commands should I use to get the type of result that I seek, shown below:
AB_dist20_merge.bed
Chr1 10 60 A1,B1,A2
Chr1 75 160 A3,B2
Thanks to Alex for pointing out an error, changed from A3,B1 to A3,B2 in the line above
In my manually parsed output example above, I do NOT collapse coords for features A2 and A3 together (the rule mentioned in the intro) whereas mergeBed is agnostic of this. Any ideas on how to use bedtools and / or bedOps?
note to self: Images (with and without annotation) to help visualize this example has been added below...
Raw Image for example
Annotated image for example
Can you explain the
A3,B1
part of your expected result? That doesn't make sense in the context of your question.You can do this with
bedmap
very easily, I think, but I need to understand if your expected output is correct, first.Thank you for pointing that out, Alex. I fixed the error above. Look forward to your response. Thanks, in advance.
I think this is complicated enough to warrant a custom python/perl/R/ruby/haskell/FORTRAN program :-)