how to return the componenets from PCA back to original variables?
2
1
Entering edit mode
8.1 years ago
M K ▴ 660

I used Principal Component Analysis technique (PCA) under R to reduce the number of explanatory (independent) variables in my model (i.e PCA was used for variable reduction only). After running PCA, I got the components (10 components). What I want to do know is return these components back to the original variables(i.e I want to know what are the variables inside each of these components). My original data matrix contains 35,000 rows and 500 columns.

R • 12k views
ADD COMMENT
1
Entering edit mode

What you want are probably the loadings. If you don't have access to them, you can try to calculate them manually. Correct me if I am wrong anyone, but the loadings are essentially the correlation of standardized original observations to the PCs.

ADD REPLY
0
Entering edit mode

Do you still have the matrix of loadings, or just the 10 PCs?

ADD REPLY
1
Entering edit mode

I have both of them.

ADD REPLY
0
Entering edit mode

Then you can certainly get very very close to your original data - and you can even interpolate data you don't have, if you wish. Of course, it depends on how much of the variance 10 PCs will explain, but it's likely to be most of it. (...right? should be in the report)

Unfortunately PCA can differ from implementation to implementation depending on how the centering is done, and some other minor details, so you really will have to take a proper look at the code used to generate the loadings and PCs if you didn't use the generic R prcomp().

A good place to start is: http://stats.stackexchange.com/questions/229092/how-to-reverse-pca-and-reconstruct-original-variables-from-several-principal-com

and an R specific demo here: http://stats.stackexchange.com/questions/57467/how-to-perform-dimensionality-reduction-with-pca-in-r/57478#57478

ADD REPLY
5
Entering edit mode
8.1 years ago

I don't believe this is possible. Principal components are derived from projecting the data to a vector that maximizes the spread or variance along that vector - see here mostly the visualizations. Asking which variables contributed most to this projection is a difficult question, similar to asking which points in this linear fit contribute most to the slope of a linear fit:

Which points would you pick and why? The principal components reconstruct the relationships in the data, but are derived from the data in a way that doesn't directly relate to any individual features of the data.

ADD COMMENT
2
Entering edit mode
8.1 years ago

I think you are looking for the loadings. Depending on which method you used in R, these could be in the "loadings" or "rotation" slot in the object returned from the PCA routine. The loadings tell you how the original variables are weighted to form each principal component.

ADD COMMENT

Login before adding your answer.

Traffic: 2462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6