Question

H3K4me3 occupancy visualisation

0

Entering edit mode

9.6 years ago

Constantine ▴ 290

Hello

I have an issue when it comes to visualising NGS data (ChIP-Seq). The image below represents what is currently known about occupancy of H3K4me3.

hesc.H3k4me3.tss.all

I've downloaded many different data of H3K4me3 ChIP-Seq from the EncodeProject and when I try to visualize H3K4me3 occupancy I get something like this:

http://postimg.org/image/qk7oie2s7/

Initially, I used the already uploaded BigWig and Bed files (both from mouse and human) to generate the plots. As this didn't give me the expected result, I downloaded the fastq files, aligned them to different tools and used different peakcallers. In the end, I was getting the same plot.

To plot occupancy I've used either ngsplot or deeptools.

Am I missing something here? I've used more than 20 datasets and I end up with the same graph

Thank you in advance

ChIP-Seq • 3.4k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 9.6 years ago by Constantine ▴ 290

Ram · Answer 1 · 2015-09-09

Have you tried using the same files and same tools used to create the first image that you posted?

You can create the image on the right using ChIPSeeker

I also used a tweak by using the tagmatrix from ChIPSeeker and plotting the number of tags to get something similar to your graph on the left. here's the code, it starts from a bed-formatted file. I used it with ENCODE data and it works as expected.

library(ChIPseeker); library(TxDb.Hsapiens.UCSC.hg19.knownGene)
# get TSS info
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
promoter <- getPromoters(TxDb=txdb, upstream=2000, downstream=2000)
#read data
data<- readPeakFile(peakfile = "data.narrowPeak")
# get tag matrix
tagMatrix <- getTagMatrix(data, windows=promoter)
# calculate how many tags you have at each position
tags <- c()
for(i in 1:dim(tagMatrix)[2]){
  tags <- c(tags, sum(tagMatrix[,i]))
}

# plot it
plot(tags, type="n", ylim=c(0, 120), frame.plot=F, xlab="", ylab="Binding",xaxt="n")
axis(1, at=seq(0, 4000, 1000),labels=c("","","","",""),col.axis="black", las=2, cex.axis=1.2, tck=-.01)

mtext(side=1,text=c("-2kb", "-1kb", "TSS", "+1kb", "+2kb"), at=seq(0, 4000, 1000), line=0.4)
mtext(side=1,text=("Distance to TSS"), at=2000, line=2, cex=1.4)
lines(tags, col="blue", lwd=5)

Ram · Answer 2 · 2015-09-15

I agree with Pierre, the image that you showed as an example is probably highlighting the -1 and +1 nucleosomes. To get such picture you need to a sharp and narrow positioning of the reads.

With deepTools you can center the reads over the expected fragment length in order to get sharper peaks (using either bamCoverage or bamCompare). Also, in the image that you showed there is a peak between the TSS and the TES. To get the image that you want you need to use the referencePoint option around TSS.

score 1 · Answer 3 · 2015-09-10

This has nothing to do with the tool that you are using. This is biological. Are you taking into account the directionality of your peaks/TSS when you are doing your plots (+/- strands)? What you see in the K4me3 plot up is that the signal on +1 nucleosome is higher than the -1 nucleosome. If you are not taking the directionality into account, both +1 and -1 will cancel/add to each other and the profile looks like the one you are having.

Also, notice the small shoulder in your plot at the 5' and 3' of your peaks.

Hope this will help

Ram · Answer 4 · 2015-09-09

0

Entering edit mode

9.6 years ago

Chris Fields ★ 2.2k

The output from the first file looks quite a bit like metaseq. As TriS mentioned, might be worth trying to replicate the output using that toolkit.

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 9.6 years ago by Chris Fields ★ 2.2k