I am using autolplot function from the ggfortify library in R. Autoplot serves cluster analysis too. I wanted to know what is the algorithm used by the autoplot for finding the 1st two principal components?
I am using autolplot function from the ggfortify library in R. Autoplot serves cluster analysis too. I wanted to know what is the algorithm used by the autoplot for finding the 1st two principal components?
The PCA function that it uses is prcomp()
, which is the same as what my own package (PCAtools) and DESeq2 use.
Yes, it is performing partitioning around medoids (PAM) and identifying X number of clusters (user pre-selects desired number as second parameter to pam()
). autoplot()
then performs PCA on the dataset and shades the points based on the PAM cluster assignments. Here is the proof:
g1 <- autoplot(prcomp(iris[-5]), frame = TRUE, frame.type = 'norm')
g2 <- autoplot(pam(iris[-5], 2), frame = TRUE, frame.type = 'norm')
require(grid)
require(gridExtra)
grid.arrange(g1,g2, ncol = 2)
They are the same points, but higlighted differently.
As is typical with many CRAN (and other) packages, the documentation is poor and the program functionality does not make it readily obvious what the function is doing.
In PCA, principal components are ordered by the fraction of variance explained (i.e. eigenvalues of the covariance matrix). If this doesn't make sense to you, please read some tutorial on PCA.
If you're talking about this line:
autoplot(pam(iris[-5], 3), frame = TRUE, frame.type = 'norm')
then there's no PCA. autoplot() is a "smart" plotting function. It recognizes what objects are passed to it and calls the appropriate specialized plotting function. If you pass it an object from the cluster package, it plots the data and automatically colours points according to cluster labels. If you pass it a pca object them it will plot the data against the first two PCs.
EDIT: I am wrong. autoplot does indeed perform PCA on cluster objects. See Kevin's answer.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
ggfortify has vignette, Plotting PCA (Principal Component Analysis). Which part is not clear? Provide example data and code.
autoplot(pam(iris[-5], 3), frame = TRUE, frame.type = 'norm')
This, autoplot finds the 1st two principal components on the clustered object obtained from pam(). I wanted to know what is the algorithm autoplot uses here?