The simplest solution is in fact to use your first two lines of code and then run edgeR::cpm(d, log=FALSE)
to get normalized counts, see ?cpm
for details. If the norm factors have been calculated before then the cpm function will use them, if not then it only corrects for library size differences. I agree that this all can be confusing because there are in fact multiple options to calculate CPMs in edgeR.
The above suggestion is probably what most people find useful as it gets a set of normalized counts that are simple to calculate and do not depend on the experimental design. You could then put these normalized data to the log scale, e.g. via log2(cpms+1)
or alternatively use log=TRUE
of the CPM function. The latter would give different results than putting the CPMs on log scale manually and will also produce values smaller than zero (as log of values smaller 1 is negative), which I usually find undesirable, e.g. for plotting purposes when intuitively you expect the smallest possible value being a zero. The authors for sure have sound reason to do it the way they implemented it, but in the end you have so see which strategy is usable for your analysis, I always use the log2(cpm+1)
approach.
If there are doubts that neither the manual, not the help sections of the functions can answer you can open a question at support.bioconductor.org, the authors are outstandingly responsive, but please be sure to first use google to ensure that this has not been asked (many times) before.
Based on the documentation in ?cpm
you can also calculate CPMs based on the DGEGLM
or DGELRT
objects (after running glmFit or glmQLFit) rather than on the DGEList
which you have above with the cpm
function, but I cannot tell you in greater detail when exactly this strategy would be desirable, and there does not seem to be documentation available beyond what is in the function details section, at least I did not really find it in the manual.
I would therefore go for the suggestion in the first paragraph. I would not use the linked solution as this is doing non-standard calculations on pseudocounts that the user guide explicitely discourages, section 2.8.7
The pseudo-counts are computed for a specific purpose, and their computation depends on the experimental design as well as the library sizes, so users are advised not to interpret the psuedo-counts as general-purpose normalized counts. They are intended mainly for internal use in the edgeR pipeline.
If your aim is to obtain TMM normalized counts, this Biostars post might help you: A: output TMM normalized counts with edgeR
thank you, so it does normalise my data without doing cpm?
Yes, it tries to obtain the TMM normalized counts (you cannot obtain them natively from edgeR). Although if your aim is not this, I would choose the method described by @ATpoint below (the recommended normalization by the
edgeR
authors).