EDIT: I made a script that can parse the venn.txt output of HOMER mergePeaks for comparisons of 2 to 5 peak files (bed files) and automatically create Venn Diagrams. All you need to do is pass it a sample ID (e.g. "ABC") and the venn.txt file output by HOMER, it will create the plot in the same directory as the venn.txt file. This uses R and the VennDiagram package. Script is located here: https://github.com/stevekm/Bioinformatics/blob/master/HOMER_mergePeaks_multiVenn/multi_peaks_Venn.R
EDIT2: I also posted an implementation of this with Upset plots, which allows for >5 comparison categories, here: Visualization of ChIP-Seq peak overlaps using HOMER mergePeaks and UpSetR
EDIT3: scripts have been moved here
In my experience it is easier to just count the number of entries (lines) in each of the bed files output by HOMER mergePeaks and pass these values to R for plotting, instead of trying to parse the venn.txt file. This is the script I am using for this purpose (including the bash mergePeaks commands). You should be able to easily modify it to add more entries
#!/bin/bash
# BED files with the peaks to overlap
tmp_outH3K4ME3="peaks_H3K4ME3.bed"
tmp_outH3K27AC="peaks_H3K27AC.bed"
# a sample ID
tmp_sampleID="ABC"
# HOMER mergePeaks
mergePeaks "$tmp_outH3K4ME3" "$tmp_outH3K27AC" -prefix mergepeaks -venn mergepeaks_venn
# the mergePeaks file outputs names:
tmp_mergeH3K4ME3="mergepeaks_${tmp_outH3K4ME3}"
tmp_mergeH3K27AC="mergepeaks_${tmp_outH3K27AC}"
# count the number of unique peaks
num_H3K4ME3=$(tail -n +2 $tmp_mergeH3K4ME3 | wc -l)
echo "num_H3K4ME3 is $num_H3K4ME3"
num_H3K27AC=$(tail -n +2 $tmp_mergeH3K27AC | wc -l)
echo "num_H3K27AC is $num_H3K27AC"
# count the number of peaks in common
num_overlap=$(tail -n +2 "mergepeaks_${tmp_outH3K4ME3}_${tmp_outH3K27AC}" | wc -l)
# plot the values in a pairwise venn in R
# # make sure the correct version of R is loaded:
module unload r
module load r/3.2.0
Rscript --slave --no-save --no-restore - "$tmp_sampleID" "$num_H3K4ME3" "$num_H3K27AC" "$num_overlap" <<EOF
## R code
# load packages
library('VennDiagram')
library('gridExtra')
# get script args, print them to console
args <- commandArgs(TRUE); cat("Script args are:\n"); args
SampleID<-args[1]
peaks_H3K4ME3<-as.numeric(args[2])
peaks_H3K27AC<-as.numeric(args[3])
peaks_overlap<-as.numeric(args[4])
# get filename for the plot PDF
plot_filename<-paste0(SampleID,"_peaks.pdf")
# make a Venn object, don't print it yet
venn<-draw.pairwise.venn(area1=peaks_H3K4ME3+peaks_overlap,area2=peaks_H3K27AC+peaks_overlap,cross.area=peaks_overlap,category=c('H3K4ME3','H3K27AC'),fill=c('red','blue'),alpha=c(0.3,0.3),cex=c(2,2,2),cat.cex=c(1.25,1.25),main=SampleID,ind=FALSE)
# print it inside a PDF file, with a title
pdf(plot_filename,width = 8,height = 8)
grid.arrange(gTree(children=venn), top=SampleID) #, bottom="subtitle")
dev.off()
EOF
Sinji;
Thank you for the reply. I have experienced both of the programs and I can easily say that if you are comparing a lot of beds, HOMER is way better. Its output consisted of variety of information such as unique peaks which occur in single samples to peaks that occur in all the samples.
Give it a try.
Best,
Tunc.