Question

Use bedtool to get number of shared features from intersetions

0

Entering edit mode

8.3 years ago

themantalope ▴ 40

Hi All,

I would like to know if it is possible to use bedtools to count the number of occurrences where a feature from set A and set B overlap with a feature in set C (but don't necessarily overlap with each other).

For example, if datasets A and B are bed files from Chip-seq experiments and C defines a set of genomic regions (for example +/- 2kb around TSSs) is there a way to count the number of times that a region in A and B are found overlapping with a region in C but don't necessarily intersect with each other?

pybedtools bedtools • 2.9k views

ADD COMMENT • link updated 8.3 years ago by Alex Reynolds 36k • written 8.3 years ago by themantalope ▴ 40

0

Entering edit mode

To clarify, can you provide an example input and the output you want to get from it?

ADD REPLY • link 8.3 years ago by Ryan Dale 5.0k

score 0 · Answer 1 · 2016-09-09

0

Entering edit mode

8.3 years ago

harold.smith.tarheel ★ 5.0k

Bedtools subtract A from B and B from A to get the intervals that are unique to each, then intersect with C.

ADD COMMENT • link 8.3 years ago by harold.smith.tarheel ★ 5.0k

0

Entering edit mode

But from this how can you distinguish a case of a feature from A and B overlapping with the same feature in C when only one feature from either set is overlapping with a feature in C?

ADD REPLY • link 8.3 years ago by themantalope ▴ 40

0

Entering edit mode

Perhaps a better way to phrase this would be how to count the number of features in C that overlap with a feature in A and B?

ADD REPLY • link 8.3 years ago by themantalope ▴ 40

score 0 · Answer 2 · 2016-09-09

You could use BEDOPS bedmap --count with bedops --everything (multiset union) to count the overlaps of C with A and B, whether or not A and B overlap each other when overlapping C.

$ bedmap --delim '\t' --echo --count C.bed <(bedops --everything A.bed B.bed) > answer.bed

Using bedops --everything takes the union of A and B elements, so if A and B overlap when there is overlap with C, this will count C's overlap with A and B separately (two counts). This will also count twice when A overlaps C and B overlaps C, but A and B do not overlap.

If you only want to count a single overlap instance, then use bedops --merge on A and B to build a set of merged regions across A and B:

$ bedmap --delim '\t' --echo --count C.bed <(bedops --merge A.bed B.bed) > answer.bed

This approach results in a single count where A and B both overlap with C, and A and B overlap each other. This would result in a double count where A overlaps C and B overlaps C, but A and B do not overlap each other.