I am dealing with analyzing a set of cutNtag samples in which we expect global histone mark changes (K27me3) due to perturbation in a chromatin modifier. Unfortunately, for a series of experimental challenges, we do not have spike-ins.
For this reason, I am looking for other ways to normalize this data, as RPKM normalization based on library size will flatten out the expected global shift. From my understanding, this is quite challenging.
What I tried so far:
I tried this tool: https://github.com/stjude/ChIPseqSpikeInFree . However, the results are the exact opposite of what we expected (and also observed in western blot).
I found this example in the
csaw
book (https://bioconductor.org/books/release/csawBook/h3k27me3-wild-type-versus-knock-out.html). However, the results look puzzling to me: how is it possible that EZH2 KO leads to higher K27me3 levels? I'd guess a failure of the normalization method rather than our biological knowledge being wrong. Am I missing something?Read the diffBind vignette (https://bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf) I haven't found a clear path but I might try the approaches suggested there.
Tried to set a hard cut-off to separate 'background' bins and signal bins based on a minimum fold-change (i.e. taking everything that has at least 3x signal compared to the mean, setting the others to zero). However, this feels arbitrary and also introduces some instability in the analyses.
At this point, I don't know if I should trust the counter-intuitive results I got, search for other methods, or else. Any advice is appreciated, Best
The first thing I would do is to normalize a bigwig track by reeds per million and just look at the data in the igv. Does global changes in your case mean that you expect generally more/fewer sites that show signal for this histone mark or does it mean that existing peaks with this histone mark have higher/lower counts compared to control? Can you show an MA plot if you simply use the default normalization strategy for example as in DESeq2 or edgeR, as often this plot gives a good idea which normalization might be appropriate. Please also add a screenshot from the igv from a locus that you think is representative (like a 100kb or so stretch).
Thank you for the follow-up. Here is an example locus after normalizing with RPKM using deeptools bamCoverage. In theory, I expect changes in peaks that already exist in the WT rather than having new peaks or losing some of them. This locus behaves as expected, but many others do not (I see stability and opposite trends too).
For the MA plot, could you clarify if I should do it for consensus peaks, fixed genomic bins, or else? I will do it next.
Consensus peaks I would use for now.
This is the result of running DiffBind standard pipeline with MACS2 broadPeaks. I have 2 WT + (4+2) KO samples. The MA plot is done on the contrast KO vs WT, and a total of ~150k peaks were identified (it should be the union of the consensus if I got this right). This plot suggests very mild changes, am I correct?
I guess this is DESeq2 sheunken logFCs which are moderated towards zero when noise is high and/or power is low (or simply no DEGs exist). What is your sample size?
My sample size is 2 WT samples and 4+2 KO samples. So yeah the sample size in the control group is very low. Do you suggest trying other normalization methods instead? Maybe edgeR?
This is the result using the DBA_EDGER mode.
The normalization is essentially the same, just that edgeR in DiffBInd does not as aggressively shrink the logFCs towards zero when avidence for DE is low. So, the MA-plot looks more spread out, but it's probably almost the same result, meaning very low differences. Isn't H3K27me3 very broad? Maybe some window-based approach as implemented in csaw might be of use here, but the underlying problem of underpowerment probably does not change. I mean, you see quuite some regions with fold changes beyond -1/1 but not significant. Check data in a PCA; see whether some batch effects are present and could maybe be corrected for.
Any help in this post ChIP-seq normalization method when global loss of signal is expected. ?
Yes, I used that post as a reference and tried what suggested, thank you