Entering edit mode
3.6 years ago
camillab.
▴
160
Hi,
I want to find the variable contribution (loadings) after I compute my PCA but I noticed that depending on which package I used they change. How come? which one should I use? Is something wrong with my code?
#tidy dataset
test1bis <- na.omit(test1)
test1bis <- test1bis[,-(9:12)] #leave only FPKMs
rnames <- test1bis$`Associated.Gene.Name` #generate a matrix with rownames
test1bis <- test1bis[,-(1:3)] #remove column name
data2 <- as.matrix(test1bis) #generate a matrix with only samples
rownames(data2) <- rnames #associate rownames to the matrix
data2 <- log(data2 + 1) #log transform
test<-(t(data2)) #transpose
then I use FactoMineR
:
library("FactoMineR") # for computing PCA
library("factoextra")
library("ggplot2")
#compute PCA
res.pca <- PCA(test, graph = FALSE, scale.unit = F)
to identify loading, since it's not possible to extract directly from factomineR as mentioned here I did:
#loading PC1 PC2
loadings<-sweep(res.pca$var$coord,2,sqrt(res.pca$eig[1:ncol(res.pca$var$coord),1]),FUN="/")
and I obtained:
loading <- as.data.frame(loadings)
head(loading,5)
Dim.1 Dim.2 Dim.3
TSPAN6 -3.305174e-03 -0.006617528 0.0124388641
DPM1 4.520584e-03 -0.003854473 -0.0082734239
SCYL3 9.579238e-03 -0.004509442 -0.0007637849
C1orf112 8.683544e-03 -0.001069403 -0.0073240322
FGR -9.173168e-05 0.011989805 0.0032702963
Dim.4
TSPAN6 -0.007109117
DPM1 0.015185883
SCYL3 -0.003513365
C1orf112 0.007451701
FGR -0.005777467
But if I use prcomp
function on the same dataset I get different output for my loadings:
res.pc2a <- prcomp(test, center=T, scale=F)
load <- res.pc2a$rotation
head(load,5)
PC1 PC2 PC3
TSPAN6 1.029975e-03 0.0025505542 0.0054742962
DPM1 -8.932191e-04 0.0010277374 -0.0022086783
SCYL3 -4.946089e-03 0.0028874002 -0.0005080739
C1orf112 -1.798157e-03 0.0003073623 -0.0020600735
FGR 5.811615e-05 -0.0130233750 0.0035801752
PC4 PC5
TSPAN6 -0.003903774 0.6857941304
DPM1 0.004895724 0.5280178531
SCYL3 -0.002546638 0.4072318978
C1orf112 0.002616394 -0.2555861099
FGR -0.007457418 -0.0009055744
And if use PCAtool
:
library(tidyr) #does not accept duplicate name
test1bis <- na.omit(test1)
test1bis <- test1bis[,-(9:12)] #leave only FPKMs
rnames <-test1bis$ID
test1bis <- test1bis[,-(1:3)] #remove column name
data2 <- as.matrix(test1bis) #generate a matrix with only samples
rownames(data2) <- rnames #associate rownames to the matrix
data2 <- log(data2 + 1) #log transform
#assign column name for the ID of samples
idx <- colnames(test1bis)
#PCA
library(PCAtools)
p <- pca(data2, scale=F)
plotloadings(p) #plot loadings all PCs
loadign <- p$loadings
head(loading,5)
(The IDs are the same of gene name shown before ):
PC1 PC2
ENSG00000000003 1.037372e-03 0.0025831931
ENSG00000000419 -8.992832e-04 0.0010416784
ENSG00000000457 -4.994887e-03 0.0029404936
ENSG00000000460 -1.814579e-03 0.0003148908
ENSG00000000938 4.281306e-05 -0.0132009120
PC3 PC4
ENSG00000000003 0.0054934570 -0.004052016
ENSG00000000419 -0.0022215836 0.005110726
ENSG00000000457 -0.0005286536 -0.002651556
ENSG00000000460 -0.0020754794 0.002727814
ENSG00000000938 0.0035755992 -0.007823109
PC5
ENSG00000000003 0.7868385062
ENSG00000000419 -0.2839137164
ENSG00000000457 -0.3205780708
ENSG00000000460 0.4334467281
ENSG00000000938 -0.0002801099
Thank you very much!
Camilla
I checked the parameters of the 3 methods and look like the
PCA
does not havecenter=TRUE
option. Inprcomp
you centered it on your command line and inpca
fromPCAtools
this is a default parameter.That may be a start.
https://www.rdocumentation.org/packages/FactoMineR/versions/2.4/topics/PCA
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prcomp
https://rdrr.io/bioc/PCAtools/man/pca.html
And I guess the slight differences in digits you have between
prcomp
andpca
loadings is due to the Principal Componant calculation in both packages.Thank you, I wrote an email to the developer of
FactorMineR
long time ago and he said that "There is no way to use the PCA function without centring data" so not sure what could the the problem!I did not see it at first, but the Loadings of the first PC are in
e-n
which make the loading values super small. These genes are not the top loadings of each PCs. And the small differences that one can see might come from the inner calculation of the principal component of each tool.Try to order the loadings for PC1 and see if you have the same top loadings genes with each package.
I will reorder, also I did tried again without order the loadings and this time (I re-run the same script with
FactorMineR
but this time I get the same values but opposite sign.Why that?
That is pretty hard to say without a proper example dataset and command lines. Can you redo your analysis with
mtcars
oriris
please, so everyone can redo it at home and experiment to find some solutions.May be this : https://stats.stackexchange.com/questions/88880/does-the-sign-of-scores-or-of-loadings-in-pca-or-fa-have-a-meaning-may-i-revers
Just for clarity: PCAtools performs PCA via
BiocSingular::runPCA()
- here is the function call in the code: https://github.com/kevinblighe/PCAtools/blob/master/R/pca.R#L153-L157I am sure that we checked this previously and found that PCAtools gives the same output as
prcomp()
.Please do ensure that
center
andscale
are the sameApologies, I have missed the comments for some reason. PCAtools gives me the same results as prcomp, I was wondering why factomineR wasn`t matching with the other two packages
Cool, well, I had hoped that the factomineR developer would have given a better response other than "There is no way to use the PCA function without centring data" It must be doing some other filtering or transformation.