Use bedtool to get number of shared features from intersetions
2
0
Entering edit mode
8.2 years ago
themantalope ▴ 40

Hi All,

I would like to know if it is possible to use bedtools to count the number of occurrences where a feature from set A and set B overlap with a feature in set C (but don't necessarily overlap with each other).

For example, if datasets A and B are bed files from Chip-seq experiments and C defines a set of genomic regions (for example +/- 2kb around TSSs) is there a way to count the number of times that a region in A and B are found overlapping with a region in C but don't necessarily intersect with each other?

pybedtools bedtools • 2.9k views
ADD COMMENT
0
Entering edit mode

To clarify, can you provide an example input and the output you want to get from it?

ADD REPLY
0
Entering edit mode
8.2 years ago

Bedtools subtract A from B and B from A to get the intervals that are unique to each, then intersect with C.

ADD COMMENT
0
Entering edit mode

But from this how can you distinguish a case of a feature from A and B overlapping with the same feature in C when only one feature from either set is overlapping with a feature in C?

ADD REPLY
0
Entering edit mode

Perhaps a better way to phrase this would be how to count the number of features in C that overlap with a feature in A and B?

ADD REPLY
0
Entering edit mode
8.2 years ago

You could use BEDOPS bedmap --count with bedops --everything (multiset union) to count the overlaps of C with A and B, whether or not A and B overlap each other when overlapping C.

$ bedmap --delim '\t' --echo --count C.bed <(bedops --everything A.bed B.bed) > answer.bed

Using bedops --everything takes the union of A and B elements, so if A and B overlap when there is overlap with C, this will count C's overlap with A and B separately (two counts). This will also count twice when A overlaps C and B overlaps C, but A and B do not overlap.

If you only want to count a single overlap instance, then use bedops --merge on A and B to build a set of merged regions across A and B:

$ bedmap --delim '\t' --echo --count C.bed <(bedops --merge A.bed B.bed) > answer.bed

This approach results in a single count where A and B both overlap with C, and A and B overlap each other. This would result in a double count where A overlaps C and B overlaps C, but A and B do not overlap each other.

ADD COMMENT

Login before adding your answer.

Traffic: 2577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6