Get top-contribution variant name for PCA result
2
1
Entering edit mode
3.9 years ago

Hi everyone,

I used the factoextra package to make the biplot with top=20 contributed vars. The code is:

fviz_pca_biplot(pca_result1, title = "", col.var = "steelblue", select.var = list(contrib = 20))

The result is very nice for me. However, I have a problem that I need the name of those 20 vars in a list format that I can use later. Do you know any way to do that, except writing down the name of top=20 vars from the biplot graph?

Thank you,

R • 2.4k views
ADD COMMENT
0
Entering edit mode

Perhaps save the plot object and then run str() to see if these top 20 are stored in any specific part of the plot object:

myplot <- fviz_pca_biplot(pca_result1, title = "", col.var = "steelblue", select.var = list(contrib = 20))
str(mysplot)

...or, just check the code of the fviz_pca_biplot function to see how it defines these top 20, and then run the code yourself.

ADD REPLY
0
Entering edit mode

Thank you so much for your help. I will try the fviz_pca_biplot function to see how I can do the calculation. Thanks.

ADD REPLY
0
Entering edit mode

Hello, guillermo.luque.ds

ADD REPLY
1
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer, if they all work.
Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Thanks, I did upvote and accept the answer.

ADD REPLY
3
Entering edit mode
3.8 years ago

Hi @luongthang1908, following what is stated in the factoextra package documentation, you can get the top N contributing variables to both PC1 and PC2 by doing this:

pca_result1 = FactoMineR::PCA(X) # X is a dataframe/matrix with n rows (samples) and p cols (numeric values) 
contrib_PC12 = pca_result1$var$contrib[,1:2]  # get contributions of each variable to PC1 and PC2
eig_PC12 = pca_result1$eig[,1][1:2] # get the eigenvalues of both components
contrib_total = apply(contrib_PC12, 1, function(x) {sum(x*eig_PC12)/sum(eig_PC12)}) # calculate the total contribution
N = 20
names(sort(contrib_total, decreasing = TRUE)[1:N]) -> topN_PC12

Then topN_PC12 contains the variables you are interested in.

ADD COMMENT
0
Entering edit mode

Hello, @guillermo.luque.ds

I have just checked the way you suggested. However, there is an error:

Error in apply(contrib_PC12, 1, function(x) { : dim(X) must have a positive length
ADD REPLY
1
Entering edit mode

Hi @luongthang1908, in my example, I've assumed (perhaps wrongly) pca_result1 was the output of the PCA command from the FactoMineR library. I have updated the code so hopefully, this solves the problem for you.

ADD REPLY
0
Entering edit mode

Hi @guillermo.luque.ds, Thank you so much for your healp. The FactoMineR::PCA(X) function works perfectly.

Does it mean that FactoMineR::PCA is not similar to the regular base PCA function?

ADD REPLY
0
Entering edit mode

Hi @luongthang1908, I will assume the regular function is prcomp. At the core, both functions use singular value decomposition to perform a principal component analysis of a given matrix (e.g. a gene counts table). However, the way they output the results differs. Now regarding the PCA function from FactoMineR, this link has some additional info maybe could be worthy for your analyses.

ADD REPLY
1
Entering edit mode
3.8 years ago
MatthewP ★ 1.4k

Add line and arrows from (0, 0) to pca$rotation on your PCA biplot. Ther higher absolute values in pca$rotation matrix means higher contribution. So longer line and arrows means higher contribution.

> head(mtcars)                                                                                                                                                                      [132/725]
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb          
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4          
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4          
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1          
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1          
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2          
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1                          
> pca <- prcomp(mtcars)                                                                       
> head(pca$rotation[,1:2])                                                                    
              PC1          PC2                                                                
mpg  -0.038118199  0.009184847                                                                
cyl   0.012035150 -0.003372487                                                                
disp  0.899568146  0.435372320                                                                
hp    0.434784387 -0.899307303
drat -0.002660077 -0.003900205
wt    0.006239405  0.004861023
ADD COMMENT

Login before adding your answer.

Traffic: 1812 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6