Question

Loading PCA - different value depending on the package used

2

Entering edit mode

3.6 years ago

camillab. ▴ 160

Hi,

I want to find the variable contribution (loadings) after I compute my PCA but I noticed that depending on which package I used they change. How come? which one should I use? Is something wrong with my code?

#tidy dataset
test1bis <- na.omit(test1)
test1bis <-  test1bis[,-(9:12)] #leave only FPKMs
rnames <- test1bis$`Associated.Gene.Name` #generate a matrix with rownames
test1bis <- test1bis[,-(1:3)] #remove column name
data2 <- as.matrix(test1bis) #generate a matrix with only samples
rownames(data2) <- rnames #associate rownames to the matrix
data2 <- log(data2 + 1) #log transform
test<-(t(data2)) #transpose

then I use FactoMineR :

library("FactoMineR") # for computing PCA
library("factoextra")
library("ggplot2")

#compute PCA 
res.pca <- PCA(test, graph = FALSE, scale.unit = F)

to identify loading, since it's not possible to extract directly from factomineR as mentioned here I did:

#loading PC1 PC2
loadings<-sweep(res.pca$var$coord,2,sqrt(res.pca$eig[1:ncol(res.pca$var$coord),1]),FUN="/")

and I obtained:

loading <- as.data.frame(loadings)
head(loading,5)

                 Dim.1        Dim.2         Dim.3
TSPAN6   -3.305174e-03 -0.006617528  0.0124388641
DPM1      4.520584e-03 -0.003854473 -0.0082734239
SCYL3     9.579238e-03 -0.004509442 -0.0007637849
C1orf112  8.683544e-03 -0.001069403 -0.0073240322
FGR      -9.173168e-05  0.011989805  0.0032702963
                Dim.4
TSPAN6   -0.007109117
DPM1      0.015185883
SCYL3    -0.003513365
C1orf112  0.007451701
FGR      -0.005777467

But if I use prcomp function on the same dataset I get different output for my loadings:

res.pc2a <- prcomp(test, center=T, scale=F)

load <- res.pc2a$rotation
head(load,5)

                   PC1           PC2           PC3
TSPAN6    1.029975e-03  0.0025505542  0.0054742962
DPM1     -8.932191e-04  0.0010277374 -0.0022086783
SCYL3    -4.946089e-03  0.0028874002 -0.0005080739
C1orf112 -1.798157e-03  0.0003073623 -0.0020600735
FGR       5.811615e-05 -0.0130233750  0.0035801752
                  PC4           PC5
TSPAN6   -0.003903774  0.6857941304
DPM1      0.004895724  0.5280178531
SCYL3    -0.002546638  0.4072318978
C1orf112  0.002616394 -0.2555861099
FGR      -0.007457418 -0.0009055744

And if use PCAtool :

library(tidyr) #does not accept duplicate name
test1bis <- na.omit(test1)
test1bis <-  test1bis[,-(9:12)] #leave only FPKMs
rnames <-test1bis$ID
test1bis <- test1bis[,-(1:3)] #remove column name
data2 <- as.matrix(test1bis) #generate a matrix with only samples
rownames(data2) <- rnames #associate rownames to the matrix
data2 <- log(data2 + 1) #log transform
#assign column name for the ID of samples
idx <- colnames(test1bis)

#PCA
library(PCAtools)
p <- pca(data2, scale=F)

plotloadings(p) #plot loadings all PCs
loadign <- p$loadings
head(loading,5)

(The IDs are the same of gene name shown before ):

                               PC1           PC2
ENSG00000000003  1.037372e-03  0.0025831931
ENSG00000000419 -8.992832e-04  0.0010416784
ENSG00000000457 -4.994887e-03  0.0029404936
ENSG00000000460 -1.814579e-03  0.0003148908
ENSG00000000938  4.281306e-05 -0.0132009120
                          PC3          PC4
ENSG00000000003  0.0054934570 -0.004052016
ENSG00000000419 -0.0022215836  0.005110726
ENSG00000000457 -0.0005286536 -0.002651556
ENSG00000000460 -0.0020754794  0.002727814
ENSG00000000938  0.0035755992 -0.007823109
                          PC5
ENSG00000000003  0.7868385062
ENSG00000000419 -0.2839137164
ENSG00000000457 -0.3205780708
ENSG00000000460  0.4334467281
ENSG00000000938 -0.0002801099

Thank you very much!

Camilla

factormineR PCAtools pca prcomp loadings • 2.2k views

ADD COMMENT • link updated 3.6 years ago by Kevin Blighe 88k • written 3.6 years ago by camillab. ▴ 160

2

Entering edit mode

I checked the parameters of the 3 methods and look like the PCA does not have center=TRUE option. In prcomp you centered it on your command line and in pca from PCAtools this is a default parameter.

That may be a start.

https://www.rdocumentation.org/packages/FactoMineR/versions/2.4/topics/PCA

https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prcomp

https://rdrr.io/bioc/PCAtools/man/pca.html

And I guess the slight differences in digits you have between prcomp and pca loadings is due to the Principal Componant calculation in both packages.

ADD REPLY • link 3.6 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Thank you, I wrote an email to the developer of FactorMineR long time ago and he said that "There is no way to use the PCA function without centring data" so not sure what could the the problem!

ADD REPLY • link 3.6 years ago by camillab. ▴ 160

1

Entering edit mode

I did not see it at first, but the Loadings of the first PC are in e-n which make the loading values super small. These genes are not the top loadings of each PCs. And the small differences that one can see might come from the inner calculation of the principal component of each tool.

Try to order the loadings for PC1 and see if you have the same top loadings genes with each package.

ADD REPLY • link 3.6 years ago by Bastien Hervé 5.9k

0

Entering edit mode

I will reorder, also I did tried again without order the loadings and this time (I re-run the same script with FactorMineR but this time I get the same values but opposite sign.

                 Dim.1         Dim.2         Dim.3        Dim.4
TSPAN6   -1.037372e-03 -0.0025831931  0.0054934570 -0.004052016
DPM1      8.992832e-04 -0.0010416784 -0.0022215836  0.005110726
SCYL3     4.994887e-03 -0.0029404936 -0.0005286536 -0.002651556
C1orf112  1.814579e-03 -0.0003148908 -0.0020754794  0.002727814
FGR      -4.281306e-05  0.0132009120  0.0035755992 -0.007823109

Why that?

ADD REPLY • link 3.6 years ago by camillab. ▴ 160

0

Entering edit mode

That is pretty hard to say without a proper example dataset and command lines. Can you redo your analysis with mtcars or iris please, so everyone can redo it at home and experiment to find some solutions.

May be this : https://stats.stackexchange.com/questions/88880/does-the-sign-of-scores-or-of-loadings-in-pca-or-fa-have-a-meaning-may-i-revers

ADD REPLY • link 3.6 years ago by Bastien Hervé 5.9k

0

Entering edit mode

Just for clarity: PCAtools performs PCA via BiocSingular::runPCA() - here is the function call in the code: https://github.com/kevinblighe/PCAtools/blob/master/R/pca.R#L153-L157

I am sure that we checked this previously and found that PCAtools gives the same output as prcomp().

Please do ensure that center and scale are the same

ADD REPLY • link 3.6 years ago by Kevin Blighe 88k

0

Entering edit mode

Apologies, I have missed the comments for some reason. PCAtools gives me the same results as prcomp, I was wondering why factomineR wasn`t matching with the other two packages

ADD REPLY • link 3.6 years ago by camillab. ▴ 160

0

Entering edit mode

Cool, well, I had hoped that the factomineR developer would have given a better response other than "There is no way to use the PCA function without centring data" It must be doing some other filtering or transformation.

ADD REPLY • link 3.6 years ago by Kevin Blighe 88k