Hello, I was wondering what would be the best tool to identify and plot a venn of number of bp overlap between more than 3 bed files (similar to bedtools intersect -wo for 2 files). I have read all the related threads on multiInter and bedops. I also tried Homer mergePeaks --venn option and Intervene module that uses pyBedtools. Most of the tools seem to output only the number of overlapping features and not number of bp overlap. Please advise, thank you for your help.
Hi Alex, I am a little confused with the output of bedmap. Is the 1st column number of overlaps and 2nd column bp overlap?
Is there a way to also see which files have these overlaps? Thank you!!
Each line in your output represents a partition. By partition, I mean a disjoint subset of the union of all your input BED files. Each disjoint subset is a unique genomic interval.
A partition does not overlap another partition. So you can do counting operations on each partition and know you're getting a unique answer.
Take a look at the
--partition
operator in thebedops
documentation if you want a graphical explanation of what this means.The first column is the number of files from which overlaps are derived. If you have three files, you could get overlaps from 1, 2 or 3 files. If you have four files, you could get overlaps from 1, 2, 3, or 4 files. And so on.
So the first column tells you how many files. The second column tells you how many unique bases overlap those n files. In your example, for instance, you have two files that overlap for 153+88=241 unique bases. You have one file that has 70 unique bases.
The job of
awk
is to sum those unique bases forN
sets. If you start with three files,awk
will sum the number of unique bases where there are overlaps between all three sets.If you want to see which files provide the overlaps at each partition, add a unique ID value to each of
A.bed
,B.bed
etc. This identifier should go into the ID column (fifth column). That identifier should be unique to that file; you could useA
,B
, etc. for instance.Once you have added unique IDs to your inputs, you can add the
--echo-map-id-uniq
operator to thebedmap
command to get those identifiers in a separate column.Thanks for the explanation Alex, I understand the method now. I have 3 files A.bed B.bed C.bed and when I do the last step -
The resulting file had values of 4 and 5 in column 1, which from your explanation is the number of files with overlaps, which cannot be 4 or 5.
Also the awk command above gives me the error -
awk: cmd. line:1: fatal: cannot open file `($1==N){ s += $2; } END { print s; }' for reading (No such file or directory)
I did replace the N=123 with N=3. Please let me know if I am missing something here. Thank you.
Can you post your files somewhere? I'd be happy to take a look.
You might also check that your inputs are sorted per
sort-bed
, as non-sort-bed sorted inputs could give odd answers for the partition or union.Thank you for your help Alex! I got the bedops and awk commands to work, as well as ended up using the 'intervene pairwise' module which gave me significance of overlap between bed files.
Glad it helped, and thanks for using BEDOPS!