Question

Normalization of ATAC SEQ data for the proper deeptools heatmap

1

Entering edit mode

4.7 years ago

kinalimeric ▴ 40

Hi all,

I have paired-end 4 ATAC-seq data (2 replicates for 2 samples). I have done aligning using Bowtie2. I did filter MT reads and duplicates using Picard, then performed peak calling on Bam file using MACS2. Also I did differential peak analysis using deeptools and filter them by FDR<0.05 and abs(2foldchange)>2.

After these, I generated density peak heatmaps using deeptools. However, on the figure top on the heatmaps height of peaks are not the same for 4 files although I normalized bam files while converting to bigwig using bamCoverage.

My questions are: Should the height of those peaks be the same or slight change is acceptable? If not how can I normalize the data? Should I normalize bam files then do the peak calling again if so which tool you suggest? or diffbind normalization okay? Also, I am really confused about the coverage file normalization and peak normalization. Lastly, as written in this post Normalization and differential analysis in ATAC-seq data how can I downsample each sample?

If you could explain these I will appreciate it.

peak signal heatmap

atac seq deeptools normalization • 4.6k views

ADD COMMENT • link updated 4.7 years ago by ATpoint 85k • written 4.7 years ago by kinalimeric ▴ 40

score 6 · Accepted Answer · 2020-03-12

In some cases normalizing only for sequencing depth might be enough. Often it is not due to differences in library composition and different signal-to-noise ratios. I prefer to scale my bigwig files (or whatever counts you want to normalize) with the normalization factors from edgeR. Code examples and some details in the linked thread: A: ATAC-seq sample normalization (quantil normalization)

score 2 · Accepted Answer · 2020-03-12

2

Entering edit mode

4.7 years ago

Devon Ryan 104k

I wouldn't expect the peak heights to be identical, some amount of biological variation is normal. The goal of the normalization should instead be to set the background level to roughly similar values between samples.

You do not need to normalize your BAM files, CSAW or DiffBind will take care of that step for you.

If you want to downsample you can either use samtools view -s if starting from BAM files or seqtk if starting from fastq files. This is generally not needed.

ADD COMMENT • link 4.7 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you so much for your help!

ADD REPLY • link 4.7 years ago by kinalimeric ▴ 40

1

Entering edit mode

I disagree in part as it should be the majority of peaks that should be similar between samples not the background. Libraries can have quite different background noise levels due to some technical artifacts. In most cases, and this is the assumption that normalization strategies such as the TMM approach from edgeR or RLE from DESeq2 have, is that you have a large number of regions (peaks) that does not change between conditions. The normalization goal is to find a size factor that centers these regions to have somewhat a fold change of zero between samples. This is important in ATAC-seq but even more important in assays like ChIP-seq where technical variation due to antibody pulldown efficency can be strikingly different so background levels can vary a lot even though peaks are actually not changing much.

ADD REPLY • link 4.7 years ago by ATpoint 85k

1

Entering edit mode

I think whether you can normalize so peaks are most similar or background is most similar will depend a bit on the experiment. I've worked with a lot of people perturbing things in ways that I expect a large change in peaks. Given that normalizing for similar backgrounds make the most sense. If, however, one expects more modest or targeted changes then I completely agree that normalizing over peaks is preferable.

ADD REPLY • link 4.7 years ago by Devon Ryan 104k

1

Entering edit mode

Agreed. In my experience it depends on the context. If you have differences in signal/noise ratio go for peak normalization. If you have very different composition go for background. If you are unlucky and have both effects combined, say a ChIP for H3K27ac in a very early cell and a terminally-differentiated one plus very different antibody efficiencies between the conditions, then try both methods and see which manages better to push the majority of regions towards a FC of zero. Maybe also inspect regions that you know do not change on a genome browser or by plotting counts manually. It is a trade-off.

ADD REPLY • link 4.7 years ago by ATpoint 85k