How to perform normalization when converting ATAC bam file to bigwig file using deeptools for IGV visualization
2
3
Entering edit mode
2.8 years ago
sir.outman ▴ 40

Dear fellows:

I have some ATAC bam files(A549 in drugged and control condition) which have different sequence depths that need to convert to bigwig.

The tools I use for conversion is deeptools bamCoverage, and I use deeptools multiBamSummary to generate scale factor for each bam file. then I use bamCoverage with --normalizeUsing RPGC and --scaleFactor of each bam file to generate bigwig.

After I got the bigwig file of each sample, I load them into IGV. My propose is to display different heights of peak pileups(eg. higher peak pileups in treated samples and lower pileups on control) at target differential access positions(already known) between those samples.

Here are my questions:

  1. Does my normalization way(use RPGC and scale factors when converting bam to bigwig by bamCoverage) make sense?
  2. when I load already normalized bigwig files into IGV without adjusting any IGV params, does the displayed track height of each sample could give the expected trend(the higher track means actual reads pileups and vice versa), or do I still need to use IGV's function autoscale or adjust data range to some range to let the peak fit the trend?

Here are two figures related to my question: Fig. A and Fig.B both use normalized bigwig tracks by RPGC and a scaling factor of positive and negative treated and control samples. the red rectangle area means the expected differential access positions will show the different track heights.

In Fig A, I use IGV autoscale function to 3 tracks; In Fig B, I set the data range of 3 tracks both to 50. The position 1 in two figures gives the conflict interpretation: in Fig A auto-scaled track, their height seems the same; but in Fig B tracks be set the same data range, the negative_treated track has the highest peak.

<h5>Fig. A: IGV auto scaled</h5>

ABCC3_autoscale

<h5>Fig. B: IGV set the same data range to 50</h5>

ABCC3_same_range

Can I have your advice on solving my issues?

Many thanks!

bamCoverage deeptools bigwig IGV ATAC • 4.5k views
ADD COMMENT
2
Entering edit mode
2.8 years ago

Yes, the RPGC normalization make sense.

Considering the visualization within IGV, I would recommend to using either option #2 or the 'group autoscale' feature (select all tracks, right click, left click on group autoscale). With that setting, all tracks will have the same range, like in your second example except that the range will automatically fit all the data. In both option #2 and <group autoscale>, higher peak means more coverage which is the correct interpretation.

ADD COMMENT
1
Entering edit mode

Hi @Carlo Yague,

Thank you so much for the quick reply!

One more detail please help me to confirm: when normalizing, I should use RPGC normalization alone, or RPGC + scaling factor of each bam?

ADD REPLY
0
Entering edit mode

Ha sorry, I misread this in the original post. You should use either RPGC or a scaling factor (which I guess is based on the total number of read mapped ?), not both (otherwise you would normalize twice.

In most cases, RPGC is good enough normalization for visualization purpose, but the method ATpoint mention using (robust) scaling factor is more robust to difference inlibrary composition and will be more quantitative.

ADD REPLY
0
Entering edit mode

Thank you so much for your clear explanation @Carlo Yague and the more robust normalization method of AT point.

ADD REPLY
0
Entering edit mode

Hi Carlo, would this normalization strategy (--normalizeUsing RPGC `) also be also correct to normalize RNA-Seq bam files that are expected to have a global shift in gene expression across groups?

To normalize counts I am fitting expected counts to a loess regression calculated using spike-ins, given that TPMs or FPKMs would be biased by this global shift in gene expression. Because of this, I think the same applies when creating my bigwigs (i.e. that BPMs or RPKMs per bin would be biased). However, just for visualization, it seems like this normalization approach with RPGC seems to be applicable. However, I am very new to RNA-Seq and would really appreciate a confirmation.

Thanks so so much

ADD REPLY
1
Entering edit mode
2.8 years ago
ATpoint 85k

I suggest to not use any of these per-million scaling and do this instead:

ATAC-seq sample normalization

ADD COMMENT
0
Entering edit mode

If I'm not wrong the scaling factor provided by multiBamSummary involves applying a strategy as the one pointed by @atpoint. Info from multiBamSummary regarding the scalingFactors provided: --scalingFactors: Compute scaling factors (in the DESeq2 manner) compatible for use with bamCoverage and write them to a file. The file has tab-separated columns “sample” and “scalingFactor”.

So, using the scaling factor provided by multiBamSummary is in agreement with what ATpoint suggests. And, you should only use that factor and not scaling factor + RPGC (one or the other but not both options).

ADD REPLY

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6