I have two bed files - they correspond to peak intervals for ATAC-seq of different samples.
I want to parse out the peak intervals unique to one file, as well as quantify the similarities between the two files. How can I do this with BedToools?
I have two bed files - they correspond to peak intervals for ATAC-seq of different samples.
I want to parse out the peak intervals unique to one file, as well as quantify the similarities between the two files. How can I do this with BedToools?
Via BEDOPS:
$ bedops --not-element-of 100% A.bed B.bed > elements_unique_to_A.bed
$ bedops --not-element-of 100% B.bed A.bed > elements_unique_to_B.bed
To measure similarity, you could calculate the Jaccard index based on cardinality of intersection and union of sets A and B.
First, calculate cardinalities:
$ AnB=`bedops --element-of 1 A.bed B.bed | wc -l`
$ BnA=`bedops --element-of 1 B.bed A.bed | wc -l`
$ AuB=`bedops --everything A.bed B.bed | wc -l`
Then calculate the index:
$ calc "(${AnB}+${BnA})/(2*${AuB))"
You can and might likely want to adjust the overlap criteria passed to --element-of
to make set membership more stringent than one base of overlap between peaks. For example, you might specify a minimum of 50%
overlap between a pair of peaks to call them "similar". Or you'd calculate some summary statistics about peaks and require a minimum of one SD of overlap (however many bases that is for your datasets), etc.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hello a.rex ,
what's the problem with bedtools intersect?
fin swimmer
Look at bedtools intersect (similarities) and bedtools subtract (unique peaks).