Tutorial:Visualization of ChIP-Seq peak overlaps using HOMER mergePeaks and UpSetR
1
13
Entering edit mode
8.6 years ago
steve ★ 3.5k

EDIT: I've aggregated some of the scripts here, with example output here

I had previously made a post here on how to create venn diagrams in R using the venn output from HOMER's mergePeaks tool. However this is limited in scope to comparisons of 2 to 5 sets of peaks. For more flexibility and scalability, I created a script that uses the UpSetR package in R to create UpSet plots of the overlaps. The script itself can be found here with a sample basic workflow & output. I've also copied the script below:

#!/usr/bin/env Rscript

## USAGE: multi_peaks_UpSet_plot.R <sampleID> /path/to/venn.txt
## This script will process venn output from HOMER mergePeaks and make an UpSet plot
## this is currently only for R version 3.3.0; module unload r; module load r/3.3.0; R

# resources:
# http://www.caleydo.org/tools/upset/
# https://cran.r-project.org/web/packages/UpSetR/vignettes/basic.usage.html#example-5-alternative-input-formats 
# https://github.com/hms-dbmi/UpSetR

# ~~~~~ LOAD PACKAGES ~~~~~~~ #
library("UpSetR"); library("ggplot2"); library("grid"); library("plyr")
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # 


# ~~~~~ GET SCRIPT ARGS ~~~~~~~ #
args <- commandArgs(TRUE); cat("Script args are:\n"); args
SampleID<-args[1]
venn_table_file<-args[2]

plot_outdir<-dirname(venn_table_file)
plot_filepath<-paste0(plot_outdir,"/",SampleID,"_UpSetR_plot.pdf") 
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # 


# ~~~~~ PARSE THE VENN TABLE ~~~~~~~ #
# read in the venn text
venn_table_df<-read.table(venn_table_file,header = TRUE,sep = "\t",stringsAsFactors = FALSE,check.names = FALSE)
# > venn_table_df
# Sample2-H3K27AC.bed Sample1-H3K27AC.bed Total                                Name
# 1                                   X 29005                   Sample1-H3K27AC.bed
# 2                 X                   19523                   Sample2-H3K27AC.bed
# 3                 X                 X 35344 Sample2-H3K27AC.bed|Sample1-H3K27AC.bed


# get the venn categories
venn_categories<-colnames(venn_table_df)[!colnames(venn_table_df) %in% c("Total","Name")] 
cat("Venn categories are:\n"); venn_categories
# > cat("Venn categories are:\n"); venn_categories
# Venn categories are:
#   [1] "Sample2-H3K27AC.bed" "Sample1-H3K27AC.bed"


# venn_categories
num_categories<-length(venn_categories)
cat("Num categories are:\n"); num_categories
# > num_categories<-length(venn_categories)
# > cat("Num categories are:\n"); num_categories
# Num categories are:
#   [1] 2


# make a summary table
venn_summary<-venn_table_df[!colnames(venn_table_df) %in% venn_categories]
cat("Venn summary table is categories are:\n"); venn_summary
# > venn_summary<-venn_table_df[!colnames(venn_table_df) %in% venn_categories]
# > cat("Venn summary table is categories are:\n"); venn_summary
# Venn summary table is categories are:
#   Total                                Name
# 1 29005                   Sample1-H3K27AC.bed
# 2 19523                   Sample2-H3K27AC.bed
# 3 35344 Sample2-H3K27AC.bed|Sample1-H3K27AC.bed


# write summary table
plot_filepath<-paste0(plot_outdir,"/",SampleID,"_UpSetR_plot.pdf") 
write.table(venn_summary,file = paste0(plot_outdir,"/","venn_summary.tsv"),quote = FALSE,row.names = FALSE)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # 


# ~~~~~ SET UP THE PLOT ~~~~~~~ #
# convert the summary table to a numeric/int vector, with element names as the combination names
# # swap the | character with &; for passing to UpSet fromExpression
upset_expression<-setNames(venn_summary[['Total']],gsub("|","&",venn_summary[['Name']],fixed = TRUE))
# > upset_expression
# Sample1-H3K27AC.bed                   Sample2-H3K27AC.bed
# 29005                               19523
# Sample2-H3K27AC.bed&Sample1-H3K27AC.bed
# 35344

# save the plot into a PDF
cat("Plot output file is:\n"); plot_filepath

pdf(plot_filepath,width = 8,height = 8)
upset(fromExpression(upset_expression),order.by = "degree", decreasing = F,mainbar.y.label = "Overlapping Peaks", sets.x.label = "Peaks per Category") # , group.by = "sets"
dev.off()

Example output: screen shot 2017-10-30 at 4 31 15 pm

ChIP-Seq peaks R HOMER • 6.9k views
ADD COMMENT
0
Entering edit mode

Looks great steve. Can you post an example figure?

ADD REPLY
1
Entering edit mode

yes I've updated the original post and included example output here

ADD REPLY
0
Entering edit mode

Pretty creative plot - thanks!

ADD REPLY
0
Entering edit mode

Really cool way to visualise the overlapping peaks! Thanks Steve!

ADD REPLY
0
Entering edit mode
8.3 years ago
morovatunc ▴ 560

Hi steve, I would like to ask you something related the interpretation of the HOMER Venn content.

I think venn matrix should be diagonal but when I check the values it seems they are not. If you have any idea about this, could you enlighten me ?

I think homer is a great tool of identifying overlapping peaks but this is the only disadvantage that which makes me confused a lot.

Please check this link for a example. I have asked this a while ago but yet to be solved.

HOMER mergepeaks matrix interpreatation

ADD COMMENT
1
Entering edit mode

That is a great question, but unfortunately I do not have an answer. I have never understood the purpose or usage of the matrix output, so I usually just ignore it / don't output it.

ADD REPLY
0
Entering edit mode

To be honest, I also kept it unused. The resulting file of the overlaps is very good. Especially where you can get the information about how many sub peaks are in a big merged peak.

Anyways, thank you for the anwser!

ADD REPLY

Login before adding your answer.

Traffic: 2011 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6