DESeq2 vignette states
The point of these two transformations, the VST and the rlog, is to remove the dependence of the variance on the mean, particularly the high variance of the logarithm of count data when the mean is low.
and the documentation of rlog explains
The transformation is useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis
I understand that "checking for outliers" means checking for outliers via a PCA plot (or something similar).
Why is minimizing differences (between samples) for rows with low counts important for the PCA plot?
Why does the variance have to be independent of the mean (homoscedasticity) for that?