Dear fellows:
I have some ATAC bam files(A549 in drugged and control condition) which have different sequence depths that need to convert to bigwig.
The tools I use for conversion is deeptools bamCoverage, and I use deeptools multiBamSummary to generate scale factor for each bam file. then I use bamCoverage with --normalizeUsing RPGC
and --scaleFactor of each bam
file to generate bigwig.
After I got the bigwig file of each sample, I load them into IGV. My propose is to display different heights of peak pileups(eg. higher peak pileups in treated samples and lower pileups on control) at target differential access positions(already known) between those samples.
Here are my questions:
- Does my normalization way(use
RPGC
andscale factors
when converting bam to bigwig by bamCoverage) make sense? - when I load already normalized bigwig files into IGV without adjusting any IGV params, does the displayed track height of each sample could give the expected trend(the higher track means actual reads pileups and vice versa), or do I still need to use IGV's function autoscale or adjust data range to some range to let the peak fit the trend?
Here are two figures related to my question: Fig. A and Fig.B both use normalized bigwig tracks by RPGC
and a scaling factor
of positive and negative treated and control samples. the red rectangle area means the expected differential access positions will show the different track heights.
In Fig A, I use IGV autoscale function to 3 tracks; In Fig B, I set the data range of 3 tracks both to 50. The position 1 in two figures gives the conflict interpretation: in Fig A auto-scaled track, their height seems the same; but in Fig B tracks be set the same data range, the negative_treated track has the highest peak.
<h5>Fig. A: IGV auto scaled</h5> <h5>Fig. B: IGV set the same data range to 50</h5>Can I have your advice on solving my issues?
Many thanks!
Hi @Carlo Yague,
Thank you so much for the quick reply!
One more detail please help me to confirm: when normalizing, I should use
RPGC
normalization alone, orRPGC + scaling factor of each bam
?Ha sorry, I misread this in the original post. You should use either RPGC or a scaling factor (which I guess is based on the total number of read mapped ?), not both (otherwise you would normalize twice.
In most cases, RPGC is good enough normalization for visualization purpose, but the method ATpoint mention using (robust) scaling factor is more robust to difference inlibrary composition and will be more quantitative.
Thank you so much for your clear explanation @Carlo Yague and the more robust normalization method of AT point.
Hi Carlo, would this normalization strategy (
--normalizeUsing
RPGC `) also be also correct to normalize RNA-Seq bam files that are expected to have a global shift in gene expression across groups?To normalize counts I am fitting expected counts to a loess regression calculated using spike-ins, given that TPMs or FPKMs would be biased by this global shift in gene expression. Because of this, I think the same applies when creating my bigwigs (i.e. that BPMs or RPKMs per bin would be biased). However, just for visualization, it seems like this normalization approach with RPGC seems to be applicable. However, I am very new to RNA-Seq and would really appreciate a confirmation.
Thanks so so much