Question

Visualization of Homer mergePeaks output Venn?

2

Entering edit mode

9.1 years ago

morovatunc ▴ 560

Hi,

I am trying to visualise my overlapped chip-seq peak regions which I analysed with Homer mergePeaks function. I have got one venn info file and a "result" file. I would like to use that venn info file then visualise it but when I looked for visualisation libraries or programs, I did not find a method which merits my expectations.

the primary problem is my data is big. ( relatively :) ). I have 19 datasets in one conditions group and 9 datasets in healthy one. I have read making venn diagram for more than 3 datasets would not be smart on biostar tread.

I am trying to find overlapped regions of transcriptional factors that why I wanna know which transcriptional factors sites are most common.

I am a python coding and R mediocre.

Please don't link me to Venn/Euler Diagram Of Four Or More Sets and Draw Diagrams For Intersection Between Many Sets threads I have already read them 0192308 times), also if you think homer is not the best tool for finding the overlaps, please feel free to advice others. (Yes, I do know monkseq)

Thank you very much for your help.

Best regards,
Tunc

overlap Venn euler HOMER ChIP-Seq • 8.3k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.1 years ago by morovatunc ▴ 560

1

Entering edit mode

8.6 years ago

Sinji ★ 3.2k

Making a VennDiagram for 19 datasets is probably a very bad idea, but it seems like you already know that. I do now know of any decent visualization for so many datasets, but you may be interested in an UpSet plot. As for recommendations on finding overlaps, I recommend using bedtools intersect. I've never worked with the HOMER merge so I can't really compare them, but i've always used bedtools for this type of analysis.

ADD COMMENT • link 8.6 years ago by Sinji ★ 3.2k

0

Entering edit mode

Sinji;

Thank you for the reply. I have experienced both of the programs and I can easily say that if you are comparing a lot of beds, HOMER is way better. Its output consisted of variety of information such as unique peaks which occur in single samples to peaks that occur in all the samples.

Give it a try.

Best,

Tunc.

ADD REPLY • link 8.6 years ago by morovatunc ▴ 560

score 5 · Accepted Answer · 2016-04-25

EDIT: I made a script that can parse the venn.txt output of HOMER mergePeaks for comparisons of 2 to 5 peak files (bed files) and automatically create Venn Diagrams. All you need to do is pass it a sample ID (e.g. "ABC") and the venn.txt file output by HOMER, it will create the plot in the same directory as the venn.txt file. This uses R and the VennDiagram package. Script is located here: https://github.com/stevekm/Bioinformatics/blob/master/HOMER_mergePeaks_multiVenn/multi_peaks_Venn.R

EDIT2: I also posted an implementation of this with Upset plots, which allows for >5 comparison categories, here: Visualization of ChIP-Seq peak overlaps using HOMER mergePeaks and UpSetR

EDIT3: scripts have been moved here

In my experience it is easier to just count the number of entries (lines) in each of the bed files output by HOMER mergePeaks and pass these values to R for plotting, instead of trying to parse the venn.txt file. This is the script I am using for this purpose (including the bash mergePeaks commands). You should be able to easily modify it to add more entries

#!/bin/bash
# BED files with the peaks to overlap
tmp_outH3K4ME3="peaks_H3K4ME3.bed"
tmp_outH3K27AC="peaks_H3K27AC.bed"
# a sample ID
tmp_sampleID="ABC"

# HOMER mergePeaks
mergePeaks "$tmp_outH3K4ME3" "$tmp_outH3K27AC" -prefix mergepeaks -venn mergepeaks_venn

# the mergePeaks file outputs names:
tmp_mergeH3K4ME3="mergepeaks_${tmp_outH3K4ME3}"
tmp_mergeH3K27AC="mergepeaks_${tmp_outH3K27AC}"

# count the number of unique peaks
num_H3K4ME3=$(tail -n +2 $tmp_mergeH3K4ME3 | wc -l)
echo "num_H3K4ME3 is $num_H3K4ME3"
num_H3K27AC=$(tail -n +2 $tmp_mergeH3K27AC | wc -l)
echo "num_H3K27AC is $num_H3K27AC"

# count the number of peaks in common
num_overlap=$(tail -n +2 "mergepeaks_${tmp_outH3K4ME3}_${tmp_outH3K27AC}" | wc -l)

# plot the values in a pairwise venn in R
# # make sure the correct version of R is loaded:
module unload r
module load r/3.2.0
Rscript --slave --no-save --no-restore - "$tmp_sampleID" "$num_H3K4ME3" "$num_H3K27AC" "$num_overlap" <<EOF
  ## R code
  # load packages
  library('VennDiagram')
  library('gridExtra')
  # get script args, print them to console
  args <- commandArgs(TRUE); cat("Script args are:\n"); args
  SampleID<-args[1]
  peaks_H3K4ME3<-as.numeric(args[2])
  peaks_H3K27AC<-as.numeric(args[3])
  peaks_overlap<-as.numeric(args[4])
  # get filename for the plot PDF
  plot_filename<-paste0(SampleID,"_peaks.pdf") 
  # make a Venn object, don't print it yet
  venn<-draw.pairwise.venn(area1=peaks_H3K4ME3+peaks_overlap,area2=peaks_H3K27AC+peaks_overlap,cross.area=peaks_overlap,category=c('H3K4ME3','H3K27AC'),fill=c('red','blue'),alpha=c(0.3,0.3),cex=c(2,2,2),cat.cex=c(1.25,1.25),main=SampleID,ind=FALSE)
  # print it inside a PDF file, with a title
  pdf(plot_filename,width = 8,height = 8)
  grid.arrange(gTree(children=venn), top=SampleID) #, bottom="subtitle")
  dev.off()
EOF

score 3 · Accepted Answer · 2016-04-25

3

Entering edit mode

8.6 years ago

Ryan Dale 5.0k

Another alternative is a "binary heatmap". It scales better than a Venn diagram, though combinatorial binding of 19 factors is going to be complex no matter what way you look at it.

Here's a working example:

ADD COMMENT • link 8.6 years ago by Ryan Dale 5.0k