Question

Getting different values from deeptools bamCoverage CPM normalization, and from manually normalizing using samtools mpileup

0

Entering edit mode

4.2 years ago

rjd254 • 0

I've generated some ATAC-seq data and wanted to graph it. Its singly mapped and deduplicated already so I ran samtools mpileup on it. Then I divided each position by the number of reads in millions to get the CPM of each position. I've then tried to graph a larger region, using pyGenomeTracks and bigwig files generated from bamCoverage in deeptools v3.4.3. I ran this command

bamCoverage --normalizeUsing CPM -p 32 -bs 1 -b 01_aligned.dedup.single.sortedbycoord.bam -o 01_aligned.dedup.single.cox2.bw

and then I ran with exact scaling and extended reads

bamCoverage --normalizeUsing CPM -e --exactScaling -p 32 -bs 1 -b 01_aligned.dedup.single.sortedbycoord.bam -o coverage_bw_exact/sample_01_exactscaling_coverage.bw

Both bw, when plotted with pyGenomeTracks gave me peaks higher than my CPM normalized data from mpileup (which I'm inclined to believe due to the simplicity of the normalization of CPM).

Is there a more detailed manual on how deeptools works? I'm not sure what's not working here, while I have a suspicion it is bamCoverage not working as expected I'm not sure. I'd really appreciate some help. Thanks.

deeptools ATAC-seq bamCoverage normalization CPM • 3.4k views

ADD COMMENT • link updated 4.2 years ago by ATpoint 85k • written 4.2 years ago by rjd254 • 0

score 1 · Answer 1 · 2020-09-02

1

Entering edit mode

4.2 years ago

ATpoint 85k

I suggest you follow my answer here to normalize your data. A: ATAC-seq sample normalization (quantil normalization)

It is more robust than simple per-million scaling.

ADD COMMENT • link 4.2 years ago by ATpoint 85k

0

Entering edit mode

Understood, but the regions we're looking at look properly normalized with CPM. I've done normalization with 1x coverage (RPGC) and CPM and noticed very similar trends in these areas. I'm mostly trying to reconcile the different scaling between manual CPM through samtools and bamCoverage, despite CPM not being the most optimal way to normalize all data from ATAC-seq.

ADD REPLY • link 4.2 years ago by rjd254 • 0