Entering edit mode
4.1 years ago
pixie@bioinfo
★
1.5k
Hello, I have 3 Atac-seq samples which I have normalized across them using MAnorm2 algorithm. (https://github.com/tushiqi/MAnorm2). What I obtained are the different peak regions and the corresponding normalized read counts. I have a list of gene locations of my interest
However, for representation, I wish to use deepTools to plot profile or heatmap for the gene locations of my interest. For this I have to use the computeMatrix which in turn requires bigwigs. How can I go about this? Any alternative method to normalize the bam files of the 3 samples ?
Thanks
This provides example code for deriving scaling factors to normalize bigwig files (e.g. with deeptools) which then can be used for the profiles: A: ATAC-seq sample normalization (quantil normalization)
Thanks, actually, we were more interested in normalizing across samples, than their differential analysis. I will have a look at the scaling factors you linked!
Hello, just to clarify, I should use bedtools makewindows and the chromosome length file to generate a feature file (10kb). Then use featureCounts to extract the matrix which should be used in edgeR or DEseq right ? Thanks
Yes, that would be one option. The other option would be to use the peaks as template for the count matrix. In the end you should check with
MA-plots
which makes more sense, check also for example thecsaw
vignette at Bioconductor for a discussion on the choice of normalization options (it discusses ChIP-seq but the same applies for ATAC-seq). The bin-based strategy is supposed to work better when library composition is very different between samples and the peak strategy should work better (in most cases) when data quality is notably different between runs. I usually check the peak-based approach first and then only try something else if the MA-plots indicate that something is off like the majority of data points not centered alongy ~= 0
or any kind of strange non-symmetrical distributions.Thanks so much for the explanations and I went through the paper for the "csaw" suggestion. The data exploration is worth the time.