Question

Using BedTools to look at unique bed intervals between two files

1

Entering edit mode

6.2 years ago

a.rex ▴ 350

I have two bed files - they correspond to peak intervals for ATAC-seq of different samples.

I want to parse out the peak intervals unique to one file, as well as quantify the similarities between the two files. How can I do this with BedToools?

bedtools • 2.6k views

ADD COMMENT • link updated 6.2 years ago by Alex Reynolds 36k • written 6.2 years ago by a.rex ▴ 350

0

Entering edit mode

Hello a.rex ,

what's the problem with bedtools intersect?

fin swimmer

ADD REPLY • link 6.2 years ago by finswimmer 16k

0

Entering edit mode

Look at bedtools intersect (similarities) and bedtools subtract (unique peaks).

ADD REPLY • link 6.2 years ago by h.mon 35k

score 3 · Answer 1 · 2018-10-30

Via BEDOPS:

$ bedops --not-element-of 100% A.bed B.bed > elements_unique_to_A.bed
$ bedops --not-element-of 100% B.bed A.bed > elements_unique_to_B.bed

To measure similarity, you could calculate the Jaccard index based on cardinality of intersection and union of sets A and B.

First, calculate cardinalities:

$ AnB=`bedops --element-of 1 A.bed B.bed | wc -l`
$ BnA=`bedops --element-of 1 B.bed A.bed | wc -l`
$ AuB=`bedops --everything A.bed B.bed | wc -l`

Then calculate the index:

$ calc "(${AnB}+${BnA})/(2*${AuB))"

You can and might likely want to adjust the overlap criteria passed to --element-of to make set membership more stringent than one base of overlap between peaks. For example, you might specify a minimum of 50% overlap between a pair of peaks to call them "similar". Or you'd calculate some summary statistics about peaks and require a minimum of one SD of overlap (however many bases that is for your datasets), etc.