bed file intersection overlapping regions in at least x files
1
0
Entering edit mode
8.3 years ago
User6891 ▴ 330

Hi,

I have a set of 20 .bed files, now I want to compare them and create a novel .bed file that contains the minimal overlapping regions for at least 5 files. So if there is a overlap between a region in at least five files, I want the minimal overlapping region to be reported in the new file.

I know how to work with bedtools, I also saw the option multi-intersect, but this doesn't seem to be able to do what I want.

It would be great to receive some help.

bed regions • 2.9k views
ADD COMMENT
0
Entering edit mode
8.3 years ago

Here is a general approach with BEDOPS bedmap --count, which generalizes to n internally-disjoint input files:

$ bedops --everything 1.bed 2.bed 3.bed ... 20.bed \
    | bedmap --count --echo --delim '\t' - \
    | uniq \
    | awk -voverlaps=5 '$1 >= overlaps' \
    | cut -f2- \
    > common.bed

The bedmap step uses, by default, overlap of one or more bases for inclusion. You can modify this threshold to be more stringent.

By changing the test in the awk statement, this approach can be modified to return other subsets of the input's power set, e.g., all elements common to exactly, less than, or greater than n inputs.

Once you have the elements which meet your overlap threshold, you can then process those elements to get their overlapping regions via a final operation:

$ bedops --partition common.bed | bedmap --count --echo --delim '\t' - common.bed | awk '$1 >= 2' | cut -f2- > overlaps.bed

If your inputs are not internally disjoint — if some elements may overlap within any one of the 20 input files — you might instead apply some ID-based tricks I describe in my answer over here: A: Intersect multiple BED files

ADD COMMENT

Login before adding your answer.

Traffic: 2350 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6