Entering edit mode
4.6 years ago
srhic
▴
60
Hello,
I have some chip-seq data for three different conditions for which I have plotted RPKM normalized counts around features of interest using deeptools. The plot clearly shows differences in chip-signal between the conditions but I am concerned about different levels of backgrounds between conditions. The blue sample in the plot seems to have lower signal than the other two samples no matter which regions I plot.
Any ideas on how I can normalize the samples so the have the same basal signal? I assume some sort of z-score normalisation may work but am not sure how to do it with my bigwig files.
Thanks
What are these samples? I personally like to explore normalization efficiency with MA-plots. A properly-normalized sample should have the majority of data points centered somewhat at y = 0, or at least there should be a somewhat symmetric distribution of the data points around y = 0 depending on how dramatic the changes are between samples. Given you have a count table of normalized counts (not log2 transformed), use for each pairwise comparison:
Without knowing details I canalready predict that naive per-million scaling messes up things and you need a more elaborate normalization strategy, but lets see how the plots look. Is one of these samples an input sample?
By the way you have to paste the full link of the image into the image field. In the above image that would be
https://i.ibb.co/6Zyhb1b/chip.png
so including the suffix.Thanks, the samples are histone marks under three different treatment conditions. I just have the bigwig files output by deeptools. I will try to import them in R and make a count table. Will try and get back.
Try to make a count matrix based on the merged peaks directly from the BAM files, e.g. using featureCounts. Also see for normalization: A: ATAC-seq sample normalization (quantil normalization) It applies for ChIP-seq as well.
I am trying out quantile normalization the way you described it for atac-seq. I was also able to get some good results using HOMER. I divided the genome into windows with bedtools and then extracted counts for those windows using HOMER which has an option that allows the counts to be normalized using rlog function of Deseq2. I am not sure I completely understand the rlog normalization or if it is the correct method to use but it made the profiles look much more similar. Will also see how the edgeR approach you described works. Thanks!