Hi all,
I have paired-end 4 ATAC-seq data (2 replicates for 2 samples). I have done aligning using Bowtie2. I did filter MT reads and duplicates using Picard, then performed peak calling on Bam file using MACS2. Also I did differential peak analysis using deeptools and filter them by FDR<0.05 and abs(2foldchange)>2.
After these, I generated density peak heatmaps using deeptools. However, on the figure top on the heatmaps height of peaks are not the same for 4 files although I normalized bam files while converting to bigwig using bamCoverage.
My questions are: Should the height of those peaks be the same or slight change is acceptable? If not how can I normalize the data? Should I normalize bam files then do the peak calling again if so which tool you suggest? or diffbind normalization okay? Also, I am really confused about the coverage file normalization and peak normalization. Lastly, as written in this post Normalization and differential analysis in ATAC-seq data how can I downsample each sample?
If you could explain these I will appreciate it.
Thank you so much for your help!
I disagree in part as it should be the majority of peaks that should be similar between samples not the background. Libraries can have quite different background noise levels due to some technical artifacts. In most cases, and this is the assumption that normalization strategies such as the TMM approach from
edgeR
or RLE fromDESeq2
have, is that you have a large number of regions (peaks) that does not change between conditions. The normalization goal is to find a size factor that centers these regions to have somewhat a fold change of zero between samples. This is important in ATAC-seq but even more important in assays like ChIP-seq where technical variation due to antibody pulldown efficency can be strikingly different so background levels can vary a lot even though peaks are actually not changing much.I think whether you can normalize so peaks are most similar or background is most similar will depend a bit on the experiment. I've worked with a lot of people perturbing things in ways that I expect a large change in peaks. Given that normalizing for similar backgrounds make the most sense. If, however, one expects more modest or targeted changes then I completely agree that normalizing over peaks is preferable.
Agreed. In my experience it depends on the context. If you have differences in signal/noise ratio go for peak normalization. If you have very different composition go for background. If you are unlucky and have both effects combined, say a ChIP for H3K27ac in a very early cell and a terminally-differentiated one plus very different antibody efficiencies between the conditions, then try both methods and see which manages better to push the majority of regions towards a FC of zero. Maybe also inspect regions that you know do not change on a genome browser or by plotting counts manually. It is a trade-off.