H3K4me3 occupancy visualisation
4
0
Entering edit mode
9.2 years ago
Constantine ▴ 290

Hello

I have an issue when it comes to visualising NGS data (ChIP-Seq). The image below represents what is currently known about occupancy of H3K4me3.

hesc.H3k4me3.tss.all

I've downloaded many different data of H3K4me3 ChIP-Seq from the EncodeProject and when I try to visualize H3K4me3 occupancy I get something like this:

http://postimg.org/image/qk7oie2s7/

Initially, I used the already uploaded BigWig and Bed files (both from mouse and human) to generate the plots. As this didn't give me the expected result, I downloaded the fastq files, aligned them to different tools and used different peakcallers. In the end, I was getting the same plot.

To plot occupancy I've used either ngsplot or deeptools.

Am I missing something here? I've used more than 20 datasets and I end up with the same graph

Thank you in advance

ChIP-Seq • 3.1k views
ADD COMMENT
3
Entering edit mode
9.2 years ago
TriS ★ 4.7k

Have you tried using the same files and same tools used to create the first image that you posted?

You can create the image on the right using ChIPSeeker

I also used a tweak by using the tagmatrix from ChIPSeeker and plotting the number of tags to get something similar to your graph on the left. here's the code, it starts from a bed-formatted file. I used it with ENCODE data and it works as expected.

library(ChIPseeker); library(TxDb.Hsapiens.UCSC.hg19.knownGene)
# get TSS info
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
promoter <- getPromoters(TxDb=txdb, upstream=2000, downstream=2000)
#read data
data<- readPeakFile(peakfile = "data.narrowPeak")
# get tag matrix
tagMatrix <- getTagMatrix(data, windows=promoter)
# calculate how many tags you have at each position
tags <- c()
for(i in 1:dim(tagMatrix)[2]){
  tags <- c(tags, sum(tagMatrix[,i]))
}

# plot it
plot(tags, type="n", ylim=c(0, 120), frame.plot=F, xlab="", ylab="Binding",xaxt="n")
axis(1, at=seq(0, 4000, 1000),labels=c("","","","",""),col.axis="black", las=2, cex.axis=1.2, tck=-.01)

mtext(side=1,text=c("-2kb", "-1kb", "TSS", "+1kb", "+2kb"), at=seq(0, 4000, 1000), line=0.4)
mtext(side=1,text=("Distance to TSS"), at=2000, line=2, cex=1.4)
lines(tags, col="blue", lwd=5)
ADD COMMENT
0
Entering edit mode

I am writing because I am confused as to the definition of output of these two functions: getTagMatrix and plotAvgProf from the Chipseeker, it does not show the 'read count frequency' as the vignette of their output graphs is labelled. The label is a bit misleading.

In the plot you generate with ChipSeeker we do not even input any information about the number of reads. So to me the plot generated is the frequency or number of the genomic locations in the bed file (eg, peaks) around the TSS.

ADD REPLY
2
Entering edit mode
9.2 years ago
Fidel ★ 2.0k

I agree with Pierre, the image that you showed as an example is probably highlighting the -1 and +1 nucleosomes. To get such picture you need to a sharp and narrow positioning of the reads.

With deepTools you can center the reads over the expected fragment length in order to get sharper peaks (using either bamCoverage or bamCompare). Also, in the image that you showed there is a peak between the TSS and the TES. To get the image that you want you need to use the referencePoint option around TSS.

ADD COMMENT
1
Entering edit mode
9.2 years ago
Alternative ▴ 290

This has nothing to do with the tool that you are using. This is biological. Are you taking into account the directionality of your peaks/TSS when you are doing your plots (+/- strands)? What you see in the K4me3 plot up is that the signal on +1 nucleosome is higher than the -1 nucleosome. If you are not taking the directionality into account, both +1 and -1 will cancel/add to each other and the profile looks like the one you are having.

Also, notice the small shoulder in your plot at the 5' and 3' of your peaks.

Hope this will help

ADD COMMENT
0
Entering edit mode
9.2 years ago
Chris Fields ★ 2.2k

The output from the first file looks quite a bit like metaseq. As TriS mentioned, might be worth trying to replicate the output using that toolkit.

ADD COMMENT

Login before adding your answer.

Traffic: 1927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6