Entering edit mode
8.5 years ago
RandManP
▴
10
I have 9 ChIP-seq data (3 samples , each 3 biological replicates). I want to check their correlation. If I bin the genome to whatever size ( e.g 1000bp ) and count the number of reads per bin, shall I do normalization and then correlation or I do not need to do normalization before?
I have check it but I did not understated whether it does first normalization or not
As vchris states, you can't normalise a BAM file itself (because you normalize signal, not sequencing), and it's inefficient to normalize the signal from a BAM file for just a couple of correlation plots, so better to write out that signal to a bigWig and then use the bigWig for the correlation plot.
Make sure you use
--corMethod spearman
for the plot though. Using Pearson's for this would be a crime against statistics since the signal is not even close to being either normally distributed or linear. To be honest, using Spearman's isn't great since a big peak almost disappearing would contain roughly the variance as a blip of noise in a gene-desert. You're probably best using deepTools to do the normalization / bigWig creation - and get the Spearman rho while you're there - but then also use the bigWig to do a standard 2-factor distribution-of-variance plot with relative and absolute signal difference. Such a heatmap wouldn't give you a nice single correlation value though - you'd have to look at a lot of these heatmaps to get an idea for what samples look similar and what samples look different, since every ChIP assay would produce a different kind of plot.ahaaha.. spot on, Pearson method really at times screws up the entire hypthosis. But yes if one is trying to find a plot for the normalized profile then you can create bigwig tracks which are normalized by the size and perform as John states. However if you are interested in just comparing the bam genome wide or even promoter wide then just binning them with a higher bp in bins and see how they correlate with deeptools.