Question

How to normalize the intensity of bigwig files based on a group of house-keeping genes in ATAC-seq data?

0

Entering edit mode

2.2 years ago

Dan ▴ 180

My ATAC-seq data has a vast read number difference between samples. bamCoverage only has RPKM, CPM, BPM, RPGC, and None normalization methods, (https://deeptools.readthedocs.io/en/develop/content/tools/bamCoverage.html) How to normalize the intensity of bigwig files based on a group of housekeeping genes (e.g. Gapdh, Actin, Rpl27, ...), which should have the same intensity in different samples? Thanks

ATAC bigwig • 2.0k views

ADD COMMENT • link updated 2.2 years ago by ATpoint 85k • written 2.2 years ago by Dan ▴ 180

1

Entering edit mode

Technically, you need a .bed file with the coordinates of your housekeeping gene regions, calculate the coverage only within these regions, e.g. with bedtools coverage. Then derive the respective scaling factors and normalize your data.

However, I advise against pursuing this approach. The housekeeping-gene normalization is an approach used for qPCR or Western Blots owing to the limited options in the wet lab setting. However, housekeeping gene expression is not as stable as you might presume and probably less meaningful at all for ATAC-seq data?

Did you already check if the skewed read numbers are not attributable to mtDNA contamination?

ADD REPLY • link 2.2 years ago by Matthias Zepper 5.0k

0

Entering edit mode

Hi Mat

Thanks very much. Your suggestion is very helpful. The skewed read numbers between samples are caused by a robotic error when some samples were pooled, some samples have more than 10-fold reads than others and what I asked for. There is no problem with data quality. For this kind of skewed read number, what is the best normalization method do you think? Thanks

ADD REPLY • link 2.2 years ago by Dan ▴ 180

1

Entering edit mode

You are welcome. Since I have little experience with ATAC-seq data analysis, my favored approach is analyzing the data with a tried and tested pipeline or at least perform the normalization accordingly.

If you wish to use some sort of targeted normalization and custom approach nonetheless, a viable approach to me would be using reference data of open chromatin regions for your particular cell type. Use the complement of these regions (or possibly LADs of your cell type if unavailable) as blacklist when running deeptools bamCoverage in conjunction with RPGC normalization?

ADD REPLY • link 2.2 years ago by Matthias Zepper 5.0k

0

Entering edit mode

2.2 years ago

ATpoint 85k

Here is what I prefer to do: ATAC-seq sample normalization

If DE profiles are extensive between samples or you need a set or reliable regions I tend to use the top x % of regions with the highest average counts across all samples. That enriches for samples unlikely to be differential. Eg. the top 20%. That is basically the far right part of points on an MAplot. Subset to those regions, calculate DEseq2 size factors on them, use MAplots to see how it performs, then scale bigwigs with these factors as in the link.

ADD COMMENT • link 2.2 years ago by ATpoint 85k

score 3 · Accepted Answer · 2022-09-02

Here is what I prefer to do: ATAC-seq sample normalization

If DE profiles are extensive between samples or you need a set or reliable regions I tend to use the top x % of regions with the highest average counts across all samples. That enriches for samples unlikely to be differential. Eg. the top 20%. That is basically the far right part of points on an MAplot. Subset to those regions, calculate DEseq2 size factors on them, use MAplots to see how it performs, then scale bigwigs with these factors as in the link.