Question

WGCNA analysis in normal and tumor case

0

Entering edit mode

6.4 years ago

shivangi.agarwal800 ▴ 120

Hi

I want to perform WGCNA analysis, i am having RNA seq data (gene expression) in normal and tumor cases. I want to compare between control and tumor case with respect to modules, module-trait relationship etc. How to input control and tumor data to WGCNA?. Can I upload them by keeping in same excel sheet.? But then how it would recognize them. Please suggest. Thanks in advance.m

wgcna modules • 3.9k views

ADD COMMENT • link updated 6.4 years ago by Kevin Blighe 89k • written 6.4 years ago by shivangi.agarwal800 ▴ 120

score 5 · Answer 1 · 2018-09-30

5

Entering edit mode

6.4 years ago

Kevin Blighe 89k

You indicate that your data is currently stored in an Excel sheet, which is not typical. From where did you obtain it? Processed RNA-seq data is usually stored in a CSV or TSV file, and may be tarred and zipped and have the extension *.tar.gz. Note that storing and re-using data from an Excel sheet is not good practice. Excel adds a lot of formatting on top of your data and can even modify / manipulate it in ways that you may not anticipate.

I suggest that you go through the entire WGCNA Tutorial and then apply the methodology to your own data.

With tumour-normal data, you may consider one of the following:

run WGCNA separately for each of tumour and normal and then compare results
run WGCNA with tumour + normal together and then see which modules (and, from this, which genes contained within those modules) correlate with tumour-normal status.

If you honestly complete the online tutorial even once, you will have a better understanding. That's why the tutorial exists.

Kevin

ADD COMMENT • link 6.4 years ago by Kevin Blighe 89k

0

Entering edit mode

Ya, I meant to say that only. My data is arranged in .tsv format containing gene name in column and sample id in rows. My concern was that if we want to study the difference between two groups (control and tumor) then should we input data containing both groups or we input them separately. But as per your second suggestion (run WGCNA with tumour + normal together and then see which modules (and, from this, which genes contained within those modules) correlate with tumour-normal status), if we run them together, how to compare results between two groups. Thanks in advance.

ADD REPLY • link 6.4 years ago by shivangi.agarwal800 ▴ 120

1

Entering edit mode

When you run WGCNA with tumour + normal together, the procedure after this is:

for each module's eigenvalues, correlate (cor and cor.test) or regress (lm or glm) these to TumourNormal status (encoded as 1 and 0)
identify modules that are statistically significant from point number 1
perform literature search or pathway analysis on genes in the modules identified in point number 2

ADD REPLY • link 6.4 years ago by Kevin Blighe 89k

0

Entering edit mode

Ok thanks, I am looking into some papers. But can you elaborate more point no.1. "for each module's eigenvalues, correlate (cor and cor.test) or regress (lm or glm) these to TumourNormal status (encoded as 1 and 0)"

ADD REPLY • link 6.4 years ago by shivangi.agarwal800 ▴ 120

0

Entering edit mode

Hey, I meant to generate a plot like this, where the modules are correlated to your traits / phenotypes:

https://www.nature.com/articles/srep39614/figures/3

While correlating to 1 and 0 is not ideal, it can still be performed. There is a function in WGCNA to generate that plot but I canot find it right now. I have a similar function here, if you need it:

https://github.com/kevinblighe/CorLevelPlot

------------------------------------

A regression analysis of the modules to TumourStatus is also possible. For example:

glm(TumourNormalStatus ~ blueModule, data=data, family=binomial(link='logit'))
glm(TumourNormalStatus ~ pinkModule, data=data, family=binomial(link='logit'))
et cetera

ADD REPLY • link 6.4 years ago by Kevin Blighe 89k

0

Entering edit mode

Ya, this module-trait relationship plot, I have got. But to perform regression analysis of module to tumorstatus, I am giving command like, but getting error

glm(TumourNormalStatus ~ darkgreen, family=binomial(link='logit'))

Error in eval(predvars, data, env) : object 'TumourNormalStatus' not found

I dont know how to define tumornormalstatus. Can you help me to define this?. Lots of thanks. I have used codes as:

    mydata = read.csv("up_down_expression_all_samples3_only_normal.csv", sep = "\t");
datExpr0 = as.data.frame(t(mydata[, -c(1)]));
names(datExpr0) = mydata$GENE;
rownames(datExpr0) = names(mydata)[-c(1)];
gsg = goodSamplesGenes(datExpr0, verbose = 3);
gsg$allOK
sampleTree = hclust(dist(datExpr0), method = "average");
clust = cutreeStatic(sampleTree, minSize = 10)
 datExpr = datExpr0[keepSamples, ]
nGenes = ncol(datExpr)
nSamples = nrow(datExpr)
traitData = read.csv("trait2.csv");
allTraits = traitData[,];
femaleSamples = rownames(datExpr);
traitRows = match(femaleSamples, allTraits$CaseID);
datTraits = allTraits[traitRows,-1];
rownames(datTraits) = allTraits[traitRows,1];
sampleTree2 = hclust(dist(datExpr), method = "average")
traitColors = numbers2colors(datTraits, signed = FALSE);
powers = c(c(1:10), seq(from = 12, to=20, by=2))
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     labels=powers,cex=cex1,col="red");
abline(h=0.90,col="red")
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")
softPower = 14;
adjacency = adjacency(datExpr, power = softPower);
TOM = TOMsimilarity(adjacency);
dissTOM = 1-TOM
geneTree = hclust(as.dist(dissTOM), method = "average");
minModuleSize = 30;
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
                            deepSplit = 2, pamRespectsDendro = FALSE,
                            minClusterSize = minModuleSize);
table(dynamicMods)
mods<- table(dynamicMods)
write.csv (mods, "modules.csv")
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
MEDiss = 1-cor(MEs);
METree = hclust(as.dist(MEDiss), method = "average");
MEDissThres = 0.25
abline(h=MEDissThres, col = "red")
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)
mergedColors = merge$colors;
mergedMEs = merge$newMEs;
moduleColors = mergedColors
colorOrder = c("grey", standardColors(50));
moduleLabels = match(moduleColors, colorOrder)-1;
MEs = mergedMEs;
nGenes = ncol(datExpr);
nSamples = nrow(datExpr);
MEs0 = moduleEigengenes(datExpr, moduleColors)$eigengenes
MEs = orderMEs(MEs0)
moduleTraitCor = cor(MEs, datTraits, use = "p");
moduleTraitPvalue = corPvalueStudent(moduleTraitCor, nSamples);
sizeGrWindow(10,6)
textMatrix =  paste(signif(moduleTraitCor, 2), "\n(",
                    signif(moduleTraitPvalue, 1), ")", sep = "");
dim(textMatrix) = dim(moduleTraitCor)
par(mar = c(6, 8.5, 3, 3));
# Display the correlation values within a heatmap plot
labeledHeatmap(Matrix = moduleTraitCor,
               xLabels = names(datTraits),
               yLabels = names(MEs),
               ySymbols = names(MEs),
               colorLabels = FALSE,
               colors = greenWhiteRed(50),
               textMatrix = textMatrix,
               setStdMargins = FALSE,
               cex.text = 0.5,
               zlim = c(-1,1),
               main = paste("Module-trait relationships"))
Menopausestatus = as.data.frame(datTraits$Menopausestatus);

names(Menopausestatus) = "Menopausestatus"
modNames = substring(names(MEs), 3)

geneModuleMembership = as.data.frame(cor(datExpr, MEs, use = "p"));
MMPvalue = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership), nSamples));

names(geneModuleMembership) = paste("MM", modNames, sep="");
names(MMPvalue) = paste("p.MM", modNames, sep="");

geneTraitSignificance = as.data.frame(cor(datExpr, Menopausestatus, use = "p"));
GSPvalue = as.data.frame(corPvalueStudent(as.matrix(geneTraitSignificance), nSamples));

names(geneTraitSignificance) = paste("GS.", names(Menopausestatus), sep="");
names(GSPvalue) = paste("p.GS.", names(Menopausestatus), sep="");
module = "pink"
column = match(module, modNames);
moduleGenes = moduleColors==module;

ADD REPLY • link 6.4 years ago by shivangi.agarwal800 ▴ 120

0

Entering edit mode

you can just add 0 or 1 as tumour status

ADD REPLY • link 5.4 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

old question but its more related to R ,I get module plus the tumour status as one data frame , " data=data" this is the total dataframe .isnt it?

ADD REPLY • link 5.2 years ago by 1769mkc ★ 1.2k