Hi All,
Currently I am working on 36 patient sample RNA-Seq data. The data contains 2 experimental condition and each condition has 18 samples. The raw count matrix is 57268x 36. I omitted genes using "rowSums(counts(dds)) > 1". As a result the count matrix became 48607 x 36. In a post, I saw somebody suggesting to remove the rows which does not have corresponding Gene symbol. But It didint sound correct to me, look like more bias. Do you agree with me ?
Also, when pearsonFallback= "none", the program reads MAD resulted 0 values as NA and gives error. Therefore, I did MAD before running the code and the number of genes decreased to 25858 from 48607. Is it correct to do the MAD before pickSoftThreshold() and blockwiseModule() ?
For my WGCNA analysis, I am using networkType=Signed hybrid, corType=bicor, pearsonFallback = "individual" or "none". I am observing 13063 genes under "turquoise" module when I set pearsonFallback as "individual" & 9697 genes under "turquoise" module when I set pearsonFallback as "none". Is this normal? If not, how should I modify my code?
I really appreciate if you give me some insight,
Thanks in advance,
DESeq2
ddsMat<- DESeqDataSetFromMatrix(seMat, sample.table.ERSvsPT, design=~Batch+Type)
dds<-DESeq(ddsMat)
Omitting low count rows
dds <- dds[ rowSums(counts(dds)) > 1, ]
datExpr0<- assay(dds)
Checking for good genes
gsg = goodSamplesGenes(datExpr0, verbose = 5);
gsg$allOK
Variance Stabilization
vsd = getVarianceStabilizedData(dds)
colnames(vsd)<- colData(dds)$Written.ID
vsd2 <- t(vsd)
datExpr<- vsd2
WGCNA analysis
powers = c(c(1:10), seq(from = 12, to=20, by=2))
sft = pickSoftThreshold(datExpr,corFnc = "bicor", corOptions=list(maxPOutliers=0.1),networkType = "signed hybrid", powerVector = powers, verbose = 5)
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
1 1 0.247 -0.376 0.0625 8810.0 8920.000 17800
2 2 0.629 -0.880 0.6050 3700.0 2950.000 10800
3 3 0.727 -1.050 0.7140 1910.0 1140.000 7420
4 4 0.746 -1.140 0.7320 1130.0 484.000 5560
5 5 0.752 -1.190 0.7350 730.0 227.000 4430
6 6 0.872 -1.140 0.8570 505.0 115.000 3640
7 7 0.912 -1.130 0.9000 368.0 63.800 3060
8 8 0.940 -1.110 0.9320 278.0 37.600 2620
9 9 0.962 -1.100 0.9570 217.0 23.200 2260
10 10 0.973 -1.090 0.9700 173.0 15.000 1970
11 12 0.976 -1.100 0.9780 116.0 6.800 1570
12 14 0.965 -1.130 0.9740 82.3 3.320 1310
13 16 0.962 -1.150 0.9790 60.7 1.720 1110
14 18 0.955 -1.170 0.9780 46.3 0.955 960
15 20 0.956 -1.190 0.9800 36.2 0.541 842
pearsonFallback="individual" :
softpower = 6
nethybrid = blockwiseModules(datExpr, power = softpower,maxBlockSize = 10000,
TOMType = "signed", minModuleSize = 30,
reassignThreshold = 0, mergeCutHeight = 0.25,
numericLabels = TRUE, pamRespectsDendro = FALSE,
saveTOMs = FALSE,networkType = "signed hybrid",
verbose = 5,corType = "bicor", maxPOutliers = 0.1,
pearsonFallback = "individual")
table(moduleColors.hybrid)
moduleColors.hybrid
black blue brown cyan darkgreen
788 1804 1523 443 148
darkgrey darkorange darkred darkturquoise green
123 83 198 136 995
greenyellow grey grey60 lightcyan lightgreen
480 22270 226 309 212
lightyellow magenta midnightblue orange paleturquoise
204 556 403 91 46
pink purple red royalblue saddlebrown
716 508 888 199 52
salmon skyblue steelblue tan turquoise
456 58 47 459 13063
white yellow
78 1045
pearsonFallback="none" :
mad.datExpr =apply(datExpr,2,mad)
datExprMAD= datExpr[,mad.datExpr>0]
sft = pickSoftThreshold(datExprMAD,corFnc = "bicor", corOptions=list(maxPOutliers=0.1),networkType = "signed hybrid", powerVector = powers, verbose = 5)
Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
1 1 0.503 -0.746 0.495 4120.00 3870.000 8560
2 2 0.681 -1.290 0.787 1610.00 1210.000 5010
3 3 0.750 -1.540 0.860 761.00 457.000 3270
4 4 0.778 -1.680 0.890 402.00 208.000 2270
5 5 0.788 -1.760 0.907 230.00 104.000 1650
6 6 0.803 -1.790 0.925 140.00 54.900 1240
7 7 0.820 -1.800 0.943 89.60 30.400 955
8 8 0.835 -1.790 0.956 59.50 17.500 750
9 9 0.842 -1.790 0.963 40.90 10.400 598
10 10 0.853 -1.770 0.971 28.80 6.320 484
11 12 0.870 -1.740 0.981 15.40 2.540 330
12 14 0.873 -1.720 0.977 8.83 1.110 236
13 16 0.879 -1.700 0.979 5.39 0.516 174
14 18 0.879 -1.670 0.973 3.45 0.249 132
15 20 0.894 -1.620 0.983 2.31 0.125 101
softpower = 6
nethybrid.MAD = blockwiseModules(datExprMAD, power = softpower,maxBlockSize = 10000,
TOMType = "signed", minModuleSize = 30,
reassignThreshold = 0, mergeCutHeight = 0.25,
numericLabels = TRUE, pamRespectsDendro = FALSE,
saveTOMs = FALSE,networkType = "signed hybrid",
verbose = 5,corType = "bicor", maxPOutliers = 0.1,
pearsonFallback = "none")
table(moduleColors.hybrid.MAD)
moduleColors.hybrid.MAD
bisque4 black blue brown brown4
50 796 1161 1047 53
cyan darkgreen darkgrey darkmagenta darkolivegreen
451 232 221 99 99
darkorange darkorange2 darkred darkslateblue darkturquoise
165 53 232 44 228
floralwhite green greenyellow grey grey60
64 970 630 887 334
ivory lightcyan lightcyan1 lightgreen lightsteelblue1
65 370 68 324 70
lightyellow magenta mediumpurple3 midnightblue orange
312 694 72 426 166
orangered4 paleturquoise pink plum1 plum2
73 120 715 82 40
purple red royalblue saddlebrown salmon
665 838 291 140 479
sienna3 skyblue skyblue3 steelblue tan
87 148 85 122 495
thistle2 turquoise violet white yellow
36 9697 117 153 1005
yellowgreen
87
You could try to set the deepSplit argument to 3 (default is 2).
Thanks a lot for your suggestion. Which version do you think I should prefer for pearsonFallback, "individual" or "none" ?
Eh, I have no clue :) I just know that deepSplit influences the size of clusters/number of genes per cluster...
Hi unfortunately it did not solve my problem. I changed it into 3 & 1, but both caused increase in number of genes in the "turquoise" module.
Do you have any idea what is the acceptable number of genes in 1 module ?
deepSplit=3