I m using WGCNA for co-expression network im trying to find the hubgene finding fucntion but im getting error im not sure what im doing wrong.
library(WGCNA)
library(flashClust)
enableWGCNAThreads()
options(stringsAsFactors = FALSE);
countdata <- read.csv('UP_DOWN_WGCNA.csv', header=TRUE)
dim(countdata)
names(countdata)
femData <- countdata
names(femData)
datExpr = as.data.frame(t(femData[, -c(1)]))
View(datExpr)
dim(datExpr)
names(datExpr) = femData$gene
View(datExpr)
rownames(datExpr) = names(femData)[-c(1)]
dim(datExpr)
powers = c(c(1:10), seq(from = 12, to=20, by=2))
#powers = c(1:20) # in practice this should include powers up to 20.
sft=pickSoftThreshold(datExpr,dataIsExpr = TRUE,powerVector = powers,corFnc = cor,corOptions = list(use = 'p'),networkType = "signed")
sizeGrWindow(9, 7)
par(mfrow = c(1,2))
cex1 = 0.9
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit, signed R^2",type="n", main = paste("Scale independence"))
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],labels=powers,cex=cex1,col="red")
# Red line corresponds to using an R^2 cut-off
abline(h=0.1,col="red")
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")
softPower = 19
adj= adjacency(datExpr,type = "unsigned", power = softPower)
#turn adjacency matrix into topological overlap to minimize the effects of noise and spurious associations
TOM=TOMsimilarityFromExpr(datExpr,networkType = "unsigned", TOMType = "unsigned", power = softPower)
colnames(TOM) =rownames(TOM) = datExpr
dissTOM=1-TOM
geneTree = flashClust(as.dist(dissTOM),method="average")
plot(geneTree, xlab="", sub="",cex=0.3)
minModuleSize = 20;
ds = 3
cutHeight = 0.99999
# Module identification using dynamic tree cut
dynamicMods = cutreeDynamic(dendro = geneTree, method="tree", minClusterSize = minModuleSize);
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM, method="hybrid", deepSplit = 2, pamRespectsDendro = FALSE, minClusterSize = minModuleSize);
#the following command gives the module labels and the size of each module. Lable 0 is reserved for unassigned genes
table(dynamicMods)
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05, main = "Gene dendrogram and module colors")
restGenes= (dynamicColors != "red")
diss1=1-TOMsimilarityFromExpr(datExpr[,restGenes], power = softPower)
collectGarbage()
colnames(diss1) =rownames(diss1) = datExpr[restGenes]
hier1=flashClust(as.dist(diss1), method="average" )
plotDendroAndColors(hier1, dynamicColors[restGenes], "Dynamic Tree Cut", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05, main = "Gene dendrogram and module colors")
diag(diss1) = NA
sizeGrWindow(7,7)
collectGarbage()
TOMplot(diss1, hier1, as.character(dynamicColors[restGenes]))
module_colors= setdiff(unique(dynamicColors), "grey")
for (color in module_colors){
module=datExpr[which(dynamicColors==color)]
write.table(module, paste("module_",color, ".txt",sep=""), sep="\t", row.names=TRUE, col.names=TRUE,quote=FALSE)
}
module.order <- unlist(tapply(1:ncol(datExpr),as.factor(dynamicColors),I))
m<-t(t(datExpr[,module.order])/apply(datExpr[,module.order],2,max))
heatmap(t(m),zlim=c(0,1),col= redWhiteGreen(256),Rowv=NA,Colv=NA,labRow=NA,scale="row",RowSideColors=dynamicColors[module.order])
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
plotEigengeneNetworks(MEs, "", marDendro = c(0,4,1,2), marHeatmap = c(3,4,1,2))
#################################################
colnames(TOM) =rownames(TOM) = datExpr
tree1 <- tree2 <- fastcluster::hclust(as.dist(1-TOM),method="average")
colorh = labels2colors(datExpr)
hubs = chooseTopHubInEachModule(datExpr, colorh)
hubs
Warning message:
In labels2colors(datExpr) :
labels2colors: Number of labels exceeds number of avilable colors. Some colors will be repeated 1 times.
> hubs = chooseTopHubInEachModule(datExpr, colorh)
Error in `[.data.frame`(datExpr, , colorh == m) :
undefined columns selected
Im getting this error
Warning message: In labels2colors(datExpr) : labels2colors: Number of labels exceeds number of avilable colors. Some colors will be repeated 1 times.
hubs = chooseTopHubInEachModule(datExpr, colorh) Error in
[.data.frame
(datExpr, , colorh == m) : undefined columns selected
Any help or suggestion would be highly appreciated , am I would like to know if my script is correct or not ..
yes i followed the tutorial ,and "
datExpr
" contains my gene expression values except gene name ...Okay, great! What's the output of
labels2colors(datExpr)
?this is the message i get
Ah, I think that I see what's happening...
WGCNA may have identified just too many modules in your data, i.e., more modules that there are colours (by name) in the R palette! WGCNA, a you know, automatically assigns colour names to each palette.
Any way to see if that is the case? - how many modules were identified?
41 modules I get should i increase minModuleize which i set to 20 if i increase would it help?
No, not so sure that's the issue.
I think that you need to change this command:
With that, you may be assigning colours to every value in the entire datExpr object, which would explain why it complains that there are not enough available colours.
That command should just be applied to the module IDs/names, such as:
or
i increase minModulesize to 30 now i get like 17 module but still im getting the same issue im not sure what to do ..but i will try your suggestion
may be you can suggest me how to execute the chooseTopHubInEachModule argument
No, I'm highly confident that the problem relates to:
You are attempting to assign a unique colour to all values in datExpr. You only want to assign colours to your identified modules.
Take a look gain at the WGCNA tutorial:
[source: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/Consensus-NetworkConstruction-auto.pdf - page 5]
This, then, causes the error in chooseTopHubInEachModule
Also look up in your own code where you previously used the same function:
Always check the contents of your objects so that you understand in which contexts they are to be used.
thank you very much problem solved ,it takes a lot of time for me to go through various resources over the internet then compile and run the code and people like you are of really great help
Absolutely no problem - glad to have helped out.
When I have time, I go through tutorials like WGCNA line by line and make my own comments about what is happening. I then save the script for years later
yes assign colour to the module
yes assign colour to the module