How do you extract data coordinates from PCA in R?
2
1
Entering edit mode
6.8 years ago

I need to extract the x,y coordinates of a PCA plot (generated in R) to plot into excel (my boss prefers excel)

The code to generate the PCA:

pca <- prcomp(data, scale=T, center=T)
autoplot(pca, label=T)

If we take a look at pca$x, the first two PC scores are as follows for an example point is:

29. 3.969599e+01 6.311406e+01

So for sample 29, the PC scores are 39.69599 and 63.11406.

However if you look at the output plot in R, the coordinates are not 39.69599 and 63.11406 but ~0.09 ~0.2. Obviously some simple algebra can estimate how the PC scores are converted into the plotted coordinates but I can't do this for ~80 samples.

Can someone please shed some light on how R gets these coordinates and maybe a location to a mystery coordinate file or a simple command to generate a plotted data matrix?

NOTE: pca$x does not give me what I want

R PCA • 13k views
ADD COMMENT
2
Entering edit mode

Is this the actual code you typed into R? If so, what autoplot are you using, as the autoplot from ggplot2 does not have a method for prcomp objects? To be more specific, I suspect that using plot(pca$x[,1:2]) the coordinates will match up.

ADD REPLY
0
Entering edit mode

Using autoplot function from ggfortify. Allows autoplot do understand PCAs.

ADD REPLY
0
Entering edit mode

That is the issue, then. ggfortify autoplot.prcomp plots values that have been transformed (see https://github.com/sinhrks/ggfortify/blob/master/R/fortify_stats.R#L140 and https://github.com/sinhrks/ggfortify/blob/master/R/fortify_stats.R#L259, for example). You'll need to apply those transformations if you want the same coordinates as autoplot. Note that the ggfortify package has been removed from CRAN....

ADD REPLY
1
Entering edit mode
6.8 years ago
Jake Warner ▴ 840

From the comments it sounds like your autoplot is scaling the data. Using ggplot plots the proper PC1 and PC2 components should get the plot you want:

library(ggplot2)
scores = as.data.frame(pca.data$x) 
p <- ggplot(data = scores, aes(x = PC1, y = PC2)) + 
    geom_point(size=2) + 
    scale_fill_hue(l=40) + 
    coord_fixed(ratio=1, xlim=range(scores$PC1), ylim=range(scores$PC2) 
p

Then just write out scores and pass to your mentor to use in excel.

ADD COMMENT
0
Entering edit mode
6.8 years ago
apeltzer ▴ 150

Did you check the scale parameter? According to the manual, the values are scaled when this is set to true and could explain why your values are scaled automatically before plotting. You could also set the scaling factor yourself and see if that resolves your issue between visualization and your matrix.

https://stat.ethz.ch/R-manual/R-devel/library/stats/html/prcomp.html

Just an idea/hint, can't check right now.

ADD COMMENT
0
Entering edit mode

Tried prcomp without scale and still get discrepancies between visualisation and PC scores

ADD REPLY
0
Entering edit mode

"scale" in prcomp changes all the input values to give you an SD of 1 for each sample. That affects how the pca$x values come out, but not how they are plotted.

ADD REPLY

Login before adding your answer.

Traffic: 2781 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6