EDIT: I've aggregated some of the scripts here, with example output here
I had previously made a post here on how to create venn diagrams in R using the venn output from HOMER's mergePeaks tool. However this is limited in scope to comparisons of 2 to 5 sets of peaks. For more flexibility and scalability, I created a script that uses the UpSetR package in R to create UpSet plots of the overlaps. The script itself can be found here with a sample basic workflow & output. I've also copied the script below:
#!/usr/bin/env Rscript
## USAGE: multi_peaks_UpSet_plot.R <sampleID> /path/to/venn.txt
## This script will process venn output from HOMER mergePeaks and make an UpSet plot
## this is currently only for R version 3.3.0; module unload r; module load r/3.3.0; R
# resources:
# http://www.caleydo.org/tools/upset/
# https://cran.r-project.org/web/packages/UpSetR/vignettes/basic.usage.html#example-5-alternative-input-formats
# https://github.com/hms-dbmi/UpSetR
# ~~~~~ LOAD PACKAGES ~~~~~~~ #
library("UpSetR"); library("ggplot2"); library("grid"); library("plyr")
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
# ~~~~~ GET SCRIPT ARGS ~~~~~~~ #
args <- commandArgs(TRUE); cat("Script args are:\n"); args
SampleID<-args[1]
venn_table_file<-args[2]
plot_outdir<-dirname(venn_table_file)
plot_filepath<-paste0(plot_outdir,"/",SampleID,"_UpSetR_plot.pdf")
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
# ~~~~~ PARSE THE VENN TABLE ~~~~~~~ #
# read in the venn text
venn_table_df<-read.table(venn_table_file,header = TRUE,sep = "\t",stringsAsFactors = FALSE,check.names = FALSE)
# > venn_table_df
# Sample2-H3K27AC.bed Sample1-H3K27AC.bed Total Name
# 1 X 29005 Sample1-H3K27AC.bed
# 2 X 19523 Sample2-H3K27AC.bed
# 3 X X 35344 Sample2-H3K27AC.bed|Sample1-H3K27AC.bed
# get the venn categories
venn_categories<-colnames(venn_table_df)[!colnames(venn_table_df) %in% c("Total","Name")]
cat("Venn categories are:\n"); venn_categories
# > cat("Venn categories are:\n"); venn_categories
# Venn categories are:
# [1] "Sample2-H3K27AC.bed" "Sample1-H3K27AC.bed"
# venn_categories
num_categories<-length(venn_categories)
cat("Num categories are:\n"); num_categories
# > num_categories<-length(venn_categories)
# > cat("Num categories are:\n"); num_categories
# Num categories are:
# [1] 2
# make a summary table
venn_summary<-venn_table_df[!colnames(venn_table_df) %in% venn_categories]
cat("Venn summary table is categories are:\n"); venn_summary
# > venn_summary<-venn_table_df[!colnames(venn_table_df) %in% venn_categories]
# > cat("Venn summary table is categories are:\n"); venn_summary
# Venn summary table is categories are:
# Total Name
# 1 29005 Sample1-H3K27AC.bed
# 2 19523 Sample2-H3K27AC.bed
# 3 35344 Sample2-H3K27AC.bed|Sample1-H3K27AC.bed
# write summary table
plot_filepath<-paste0(plot_outdir,"/",SampleID,"_UpSetR_plot.pdf")
write.table(venn_summary,file = paste0(plot_outdir,"/","venn_summary.tsv"),quote = FALSE,row.names = FALSE)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #
# ~~~~~ SET UP THE PLOT ~~~~~~~ #
# convert the summary table to a numeric/int vector, with element names as the combination names
# # swap the | character with &; for passing to UpSet fromExpression
upset_expression<-setNames(venn_summary[['Total']],gsub("|","&",venn_summary[['Name']],fixed = TRUE))
# > upset_expression
# Sample1-H3K27AC.bed Sample2-H3K27AC.bed
# 29005 19523
# Sample2-H3K27AC.bed&Sample1-H3K27AC.bed
# 35344
# save the plot into a PDF
cat("Plot output file is:\n"); plot_filepath
pdf(plot_filepath,width = 8,height = 8)
upset(fromExpression(upset_expression),order.by = "degree", decreasing = F,mainbar.y.label = "Overlapping Peaks", sets.x.label = "Peaks per Category") # , group.by = "sets"
dev.off()
Example output:
Looks great steve. Can you post an example figure?
yes I've updated the original post and included example output here
Pretty creative plot - thanks!
Really cool way to visualise the overlapping peaks! Thanks Steve!