You could union the files, specify the overlap criterion for mapping, and then post-process the result to count true peaks. The ID field (fifth column) can be useful as an identifier. Using intervals alone is probably not enough, because of cases where peaks overlap.
For example, to union sorted BED files:
$ bedops -u exp1peaks1.bed exp1peaks2.bed ... exp1peaks10.bed > exp1peaks.bed
Then map the unioned set of peaks against each other:
$ bedmap --count --echo --echo-map-id-uniq --fraction-map 0.51 exp1peaks.bed | awk '($1>=6)' | cut -f2- > candidates.bed
- The
--count
operator returns the number of mapped peaks that overlap the reference peak.
- The
--echo
operator returns the reference peak.
- The
--echo-map-id-uniq
operator returns the unique IDs of mapped peaks that overlap the reference peak.
- The
--fraction-map 0.51
operator is the overlap criterion, which requires that more than half of a mapped peak must overlap the reference peak to be called an overlap.
The awk
and cut
statements at the end return candidate peaks, where there are six or more mapped peaks that met the bedmap
overlap criterion. The output is another sorted BED file.
To deal with this:
I saw someone say that if you have one big peak that overlaps several shorter ones, the count will be all off.
Once you have candidate peaks, you can use a Python script to look at the contents of the mapped peak IDs returned by --echo-map-id-uniq
.
Using peak IDs that follow a parseable pattern will help you count true overlaps among common peaks in candidates.bed
, filtering out those where you run into the above scenario.
The procedure could be redone on files for the second experiment. Then you can compare intervals across two experiments and do a final count.