Question

bed file intersection overlapping regions in at least x files

0

Entering edit mode

8.3 years ago

User6891 ▴ 330

Hi,

I have a set of 20 .bed files, now I want to compare them and create a novel .bed file that contains the minimal overlapping regions for at least 5 files. So if there is a overlap between a region in at least five files, I want the minimal overlapping region to be reported in the new file.

I know how to work with bedtools, I also saw the option multi-intersect, but this doesn't seem to be able to do what I want.

It would be great to receive some help.

bed regions • 2.9k views

ADD COMMENT • link updated 8.3 years ago by Alex Reynolds 36k • written 8.3 years ago by User6891 ▴ 330

score 0 · Answer 1 · 2016-08-10

Here is a general approach with BEDOPS bedmap --count, which generalizes to n internally-disjoint input files:

$ bedops --everything 1.bed 2.bed 3.bed ... 20.bed \
    | bedmap --count --echo --delim '\t' - \
    | uniq \
    | awk -voverlaps=5 '$1 >= overlaps' \
    | cut -f2- \
    > common.bed

The bedmap step uses, by default, overlap of one or more bases for inclusion. You can modify this threshold to be more stringent.

By changing the test in the awk statement, this approach can be modified to return other subsets of the input's power set, e.g., all elements common to exactly, less than, or greater than n inputs.

Once you have the elements which meet your overlap threshold, you can then process those elements to get their overlapping regions via a final operation:

$ bedops --partition common.bed | bedmap --count --echo --delim '\t' - common.bed | awk '$1 >= 2' | cut -f2- > overlaps.bed

If your inputs are not internally disjoint — if some elements may overlap within any one of the 20 input files — you might instead apply some ID-based tricks I describe in my answer over here: A: Intersect multiple BED files