how to intersect multiple bed files and get results in uncollapsed genomic intervals
1
0
Entering edit mode
18 months ago
Alewa ▴ 170

Is there anyway I can get the results on intersection of multiple bed files in uncollapsed format? i.e., the result should have interval/width of 1.

Here's the result I get in collapses form

(base) bash-4.2$ bedtools multiinter -header -names gr1 gr2 gr3 -i gr1.bed gr2.bed gr3.bed > collapsed_multiIntersect.bed

(base) bash-4.2$ cat collapsed_multiIntersect.bed
chrom   start   end     num     list    gr1     gr2     gr3
chr1    0       1       1       gr2     0       1       0
chr1    1       6       2       gr1,gr2 1       1       0
chr1    6       7       1       gr1     1       0       0
chr2    0       2       2       gr1,gr3 1       0       1
chr2    2       6       3       gr1,gr2,gr3     1       1       1
chr2    6       8       2       gr2,gr3 0       1       1
chr2    8       11      1       gr3     0       0       1

Here's example of data

(base) bash-4.2$ cat gr1.bed
chr1    2       2       1       *       0.7276748239528388      67.1151612419635
chr1    3       3       1       *       0.6065833754837513      49.11124410573393
chr1    4       4       1       *       0.17338064685463905     89.53636873047799
chr1    5       5       1       *       0.34515149844810367     44.35192057862878
chr1    6       6       1       *       0.656820923788473       91.98388278018683
chr2    1       1       1       *       0.21105075324885547     37.15807143598795
chr2    2       2       1       *       0.24114407738670707     49.36463311314583
chr2    3       3       1       *       0.8360638627782464      92.60302006732672
chr2    4       4       1       *       0.8508221041411161      44.008928118273616
chr2    5       5       1       *       0.8572670503053814      3.721454436890781

(base) bash-4.2$ cat gr2.bed
chr1    1       1       1       *       0.6819672391284257      97.34485428780317
chr1    2       2       1       *       0.481787727214396       30.3413461195305
chr1    3       3       1       *       0.3573118313215673      64.61741852108389
chr1    4       4       1       *       0.51849395618774        27.47685764916241
chr1    5       5       1       *       0.8251491808332503      83.9870688971132
chr2    3       3       1       *       0.42395976139232516     1.6344564035534859
chr2    4       4       1       *       0.6377549672033638      37.02820355538279
chr2    5       5       1       *       0.27647473104298115     87.21347157843411
chr2    6       6       1       *       0.6640123652759939      55.96593914087862
chr2    7       7       1       *       0.8544245520606637      64.64025254826993

(base) bash-4.2$ cat gr3.bed
chr2    1       1       1       *       0.9348035349976271      52.820321382023394
chr2    2       2       1       *       0.5684890144038945      92.6313241943717
chr2    3       3       1       *       0.47866105823777616     7.479210733436048
chr2    4       4       1       *       0.4196885135024786      95.4758029896766
chr2    5       5       1       *       0.4240405154414475      39.72786704543978
chr2    6       6       1       *       0.6786996794398874      60.717192641459405
chr2    7       7       1       *       0.7414848282933235      9.92186910007149
chr2    8       8       1       *       0.6762841297313571      56.700556515716016
chr2    9       9       1       *       0.19594529480673373     44.58376122638583
chr2    10      10      1       *       0.9014762006700039      95.27598922140896
bedtools bedops • 852 views
ADD COMMENT
1
Entering edit mode
18 months ago

Generally, you could use bedops --chop with some piping and process substitutions with bedmap to get count/ID/score data:

bedops --chop 1 some_regions.bed | bedmap --echo --count --echo-map-id - <(bedops --everything signal1.bed signal2.bed ... signalN.bed) > answer.bed

The file some_regions.bed could come from a simple merge operation upstream, e.g.:

bedops --merge signal1.bed signal2.bed ... signalN.bed | bedops --chop 1 - | bedmap --echo --count --echo-map-id - <(bedops --everything signal1.bed signal2.bed ... signalN.bed) > answer.bed

Files signal1.bed through signalN.bed would have a unique identifier in the ID (fourth) column ("gr1", "gr2", whatever).

Note: Your gr1.bed, gr2.bed and gr3.bed do not have correct BED coordinates. You'll need to run them through awk or similar to fix them, e.g. awk -vFS="\t" -vOFS="\t" '{ $2 -= 1; print $0; }' in.bed > out.bed or similar.

ADD COMMENT
0
Entering edit mode

thank you so much Alex!

ADD REPLY

Login before adding your answer.

Traffic: 2929 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6