The difference is subtle but means that vst()
can perform the transformation quicker.
vst()
is, in fact, a wrapper function of varianceStabilizingTransformation()
- it (vst) first identifies 1000 variables that are 'representative' of the dataset's dispersion trend, and uses the information from these to perform the transformation.
The key parameter in question is:
vst(..., nsub = 1000)
------------------
There is also a difference relating to the usage of blind
:
vst
This is a wrapper for the varianceStabilizingTransformation (VST) that
provides much faster estimation of the dispersion trend used to
determine the formula for the VST. The speed-up is accomplished by
subsetting to a smaller number of genes in order to estimate this
dispersion trend. The subset of genes is chosen deterministically, to
span the range of genes' mean normalized count. This wrapper for the
VST is not blind to the experimental design: the sample covariate
information is used to estimate the global trend of genes' dispersion
values over the genes' mean normalized count. It can be made strictly
blind to experimental design by first assigning a design of ~1 before
running this function, or by avoiding subsetting and using
varianceStabilizingTransformation.
However, if you set blind = TRUE
for vst()
, it seems to set the design to ~ 1
for you:
function (object, blind = TRUE, nsub = 1000, fitType = "parametric")
{
...
if (blind) {
design(object) <- ~1
}
matrixIn <- FALSE
...
vsd <- varianceStabilizingTransformation(object, blind = FALSE)
...
}
varianceStabilizingTransformation
This function calculates a variance stabilizing transformation (VST)
from the fitted dispersion-mean relation(s) and then transforms the
count data (normalized by division by the size factors or
normalization factors), yielding a matrix of values which are now
approximately homoskedastic (having constant variance along the range
of mean values). The transformation also normalizes with respect to
library size. The rlog is less sensitive to size factors, which can be
an issue when size factors vary widely. These transformations are
useful when checking for outliers or as input for machine learning
techniques such as clustering or linear discriminant analysis.
Kevin
Thank you very much, it is more clear now. I'm still not very comfortable with bioinformatics, so some concepts appear a bit difficult to understand for me, even if I study vignettes and documentation.