Entering edit mode
5.5 years ago
a.rex
▴
350
I have perhaps a naive question:
In RNA-seq with Kallisto, est_counts are generated and TPMs per transcripts can be calculated. However, to compare between sample, sleuth is used to calculate between sample normalization factors. Therefore, TPM for the individual libraries need to be normalized to be scaled to be compared.
In ATAC-seq, I have seem papers that generate coverage tracks that are normalized to RPKM (given a set bin size). This is sufficient to compare between samples. Why is there no generation of a scale factor in this case?
Literature is full of flawed analysis. RPKM might be acceptable if you do not expect any large changes in library composition and only depth normalization is required. Still, as you should perform any differential analysis with a proper statistical framework, none I know accepts RPKM but raw counts as input which will then be used to calculate proper scaling/size factors. I suggest you perform your differential analysis with the established tools. For ATAC-seq one could use either
DESeq2
, edgeR orcsaw
(which usesedgeR
internally). All of them have elaborate normalization strategies that will generate size factors. Go through the manuals of the tools and choose the one you feel comfortable with. The typical workflow I do is:macs2
using--call-summits
optioncsaw
for example suggests using sliding windows to completely avoid peak calling for differential )countRegions
fromcsaw
csaw
manualedgeR
workflow again as suggested in thecsaw
manual