Here's a one-liner with another toolkit:
$ bedops -m A.bed B.bed | bedmap --echo-map --exact - <(bedops -u A.bed B.bed) | sort-bed - > answer.bed
Or if you only have one file:
$ bedops -m C.bed | bedmap --echo-map --exact - C.bed > answer.bed
It is assumed A.bed
and B.bed
and C.bed
are sorted before set operations, i.e.:
$ sort-bed A.unsorted.bed > A.bed
$ sort-bed B.unsorted.bed > B.bed
$ sort-bed C.unsorted.bed > C.bed
Demo:
$ more C.bed
chr1 100 120 id-1
chr1 110 112 id-2
chr1 114 117 id-3
chr1 125 130 id-4
$ bedops -m C.bed | bedmap --echo-map --exact - C.bed
chr1 100 120 id-1
chr1 125 130 id-4
Another option is to filter the input file for purely nested elements.
Nested elements are a pain, but there are ways to deal with them.
Start with a utility awk
script:
#!/usr/bin/env awk -f
{
if (NR > 1) {
currentChr = $1
currentStart = $2
currentStop = $3
if ((previousStart < currentStart) && (previousStop > currentStop)) {
print $0;
}
else {
previousChr = currentChr
previousStart = currentStart
previousStop = currentStop
}
}
else {
previousChr = $1
previousStart = $2
previousStop = $3
}
}
Let's call that script getNestedElements.awk
. Make sure it is executable (chmod u+x ./getNestedElements.awk
).
You can use this script to identify elements in A2.bed
that are nested, and you would then pipe them to bedops --not-element-of 100%
to filter them out:
$ more A2.bed
chr1 80 101 id-0
chr1 100 120 id-1
chr1 110 112 id-2
chr1 114 117 id-3
chr1 125 130 id-4
$ ./getNestedElements.awk A2.bed | bedops --not-element-of 100% A2.bed -
chr1 80 101 id-0
chr1 100 120 id-1
chr1 125 130 id-4
There is only one file to begin with. I tried specifying the file twice for A and B and ended up getting every entry as a duplicate from the original file.
Pls see revised answer.
Thanks, getting closer. Your example seems to be working flawlessly (I have recreated A.bed and tried the command), but when I try it on my annotation, I lose any features that overlap with each other. Consider your example, if we were to add "chr1 80-101 id-0", we would lose all entries but the last one. We should be keeping id-0, id-1 and id-4, since only id-2 and id-3 are completely covered by another feature, id-0 and id-1 merely overlap but do not cover each other 100%.
Pls see revised answer.
Eureka! Works amazing. Indeed very important to not forget sort-bed!
Yes, sorting is very important. It only needs to be done once, however, so hopefully the pain is minimal.
The bedops one-liner didn't work correctly with my input file (out.bed). Bedops version is 2.4.26
And the awk script doesn't seem to be working either. (To be sure, I used the same A2.bed file as above). The awk script is executable.
Instead of:
Try:
Still fails: