I am trying to figure out how to extract the genes that contribute to the 1st Principal Component of a Module in WGCNA. Any advice on methodology would be much appreciated.
I am trying to figure out how to extract the genes that contribute to the 1st Principal Component of a Module in WGCNA. Any advice on methodology would be much appreciated.
The nature of principle components is that all genes will be involved. If you mean to pick out the top5, be sure they're scaled or you'll just get the highest expression genes. The Principle Components analysis must have the genes factors, and you'd have to pick out the ones with max absolute value coefficient. Not sure about WCGNA, this is for plain PCA, which we presume is inside WCGNA.
He is referring [I think] to the component / variable loadings. I am not sure that WGCNA supports that, but my own package, PCAtools, does: https://github.com/kevinblighe/PCAtools
Depends on how you use it. To mix PCAtools with WGCNA's module output could be time-consuming (for me, to find out how to do it). I am sure that there is a way via WGCNA though. By the way, I used to work in your lab - you could ask Professor Lasky-Su's postdoc (now Investigator) and she'll probably know a way.
I think that this is what you need, no? - https://rdrr.io/cran/WGCNA/man/moduleEigengenes.html
haha The Cave is where it all happens, to be sure. Yes you are absolutely right with the moduleEigengenes which was the first thing I tried. However, it only gives the 1st PC which is an abstract value. I want the actual genes (and their values of course) that contribute to give that abstract value.
Indeed, my desk was the first on the left as you walk in the door. Was there in 2016/7.
I want the actual genes (and their values of course) that contribute to give that abstract value.
Isn't this just the original expression value, in that case? WGCNA first identifies the modules by constructing a correlation network, and then it performs PCA on the genes in each module, reporting thereafter the loadings for just the first PC (PC1) from each module. This is and is not a good approach, but this is one of the limitations of WGCNA.
Now that you have code
pZ <- prcomp(X_black, rank. = 3) #to get 1st 3 components
So you are just using prcomp, which is part of base R.
pZ$rotation
will get you the loadings for the genes for each PC
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prcomp
Hi reara,
With signed netwroks, In WGCNA the genes that mostly contributes to the 1st PC are the ones with the highest Module Membership or Intramodulat Connectivity
For Unsigneed networks, the genes that mostly contributes to the 1st PC are the ones with the highest absolute value of Module Membership or Intramodulat Connectivity
Thank you so much-I did try that before and this is the code and output since mine is an unsigned network. How would I find out the actual genes from below result in a tabular format:
intramodularconnectivity = intramodularConnectivity.fromExpr(datExpr, moduleColors,
corFnc = "cor", corOptions = "use = 'p'",
weights = NULL,
distFnc = "dist", distOptions = "method = 'euclidean'",
power = 6,
scaleByMax = FALSE,
ignoreColors = if (is.numeric(colors)) 0 else "grey",
getWholeNetworkConnectivity = TRUE)
Just following the Horvath standard tutorial:
datExpr0 = as.data.frame(t(femData[, -c(1:8)]));
names(datExpr0) = femData$substanceBXH;
rownames(datExpr0) = names(femData)[-c(1:8)];
# Plot a line to show the cut
abline(h = 7, col = "red");
# Determine cluster under the line
clust = cutreeStatic(sampleTree, cutHeight = 7, minSize = 10)
table(clust)
# clust 1 contains the samples we want to keep.
keepSamples = (clust==1)
datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Not too familiar with this particular part of WGCNA; however, please show relevant code that you have used and also run
str()
on your key objects (and paste the output here).`eigenmatrix.black.na <- t(na.omit(t(eigenmatrix.black)))
df_black <- read.csv('eigengmatrix.black.na.csv', header = TRUE)
X_black <- df_black
X_black <- as.matrix(df_black)
nPC=1
n=dim(X_black)[1]
p=dim(X_black)[2]
svd1 = svd(X_black, nu = min(n, p, nPC), nv = min(n, p, nPC))
svd1$d
PCA
pZ <- prcomp(X_black, rank. = 3) #to get 1st 3 components
summary(pZ) pZ$x #to get components`
below is the eigenmatrix.black.na which consists of the genes and expression values of all the genes from the black module
When you are asking for help, the idea is to make it as easy as possible for people to find the answer. Saying you need help with "a module in WGCNA" does not make it easy. Specifying the command line you used would at least give people a hint.
Modules are the standard output of any WGCNA analyses. Hence used that word assuming familiarity with such output.