WGCNA- Large number of genes clustering under one Module
1
3
Entering edit mode
8.2 years ago
gokce.ouz ▴ 70

Hi All,

Currently I am working on 36 patient sample RNA-Seq data. The data contains 2 experimental condition and each condition has 18 samples. The raw count matrix is 57268x 36. I omitted genes using "rowSums(counts(dds)) > 1". As a result the count matrix became 48607 x 36. In a post, I saw somebody suggesting to remove the rows which does not have corresponding Gene symbol. But It didint sound correct to me, look like more bias. Do you agree with me ?

Also, when pearsonFallback= "none", the program reads MAD resulted 0 values as NA and gives error. Therefore, I did MAD before running the code and the number of genes decreased to 25858 from 48607. Is it correct to do the MAD before pickSoftThreshold() and blockwiseModule() ?

For my WGCNA analysis, I am using networkType=Signed hybrid, corType=bicor, pearsonFallback = "individual" or "none". I am observing 13063 genes under "turquoise" module when I set pearsonFallback as "individual" & 9697 genes under "turquoise" module when I set pearsonFallback as "none". Is this normal? If not, how should I modify my code?

I really appreciate if you give me some insight,

Thanks in advance,

DESeq2

ddsMat<- DESeqDataSetFromMatrix(seMat, sample.table.ERSvsPT, design=~Batch+Type)    
dds<-DESeq(ddsMat)

Omitting low count rows

dds <- dds[ rowSums(counts(dds)) > 1, ]    
datExpr0<- assay(dds)

Checking for good genes

gsg = goodSamplesGenes(datExpr0, verbose = 5);    
gsg$allOK

Variance Stabilization

vsd = getVarianceStabilizedData(dds)  
colnames(vsd)<- colData(dds)$Written.ID    
vsd2 <- t(vsd)    
datExpr<- vsd2

WGCNA analysis

powers = c(c(1:10), seq(from = 12, to=20, by=2))
sft = pickSoftThreshold(datExpr,corFnc = "bicor", corOptions=list(maxPOutliers=0.1),networkType = "signed hybrid", powerVector = powers, verbose = 5)

   Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
1      1    0.247 -0.376         0.0625  8810.0  8920.000  17800
2      2    0.629 -0.880         0.6050  3700.0  2950.000  10800
3      3    0.727 -1.050         0.7140  1910.0  1140.000   7420
4      4    0.746 -1.140         0.7320  1130.0   484.000   5560
5      5    0.752 -1.190         0.7350   730.0   227.000   4430
6      6    0.872 -1.140         0.8570   505.0   115.000   3640
7      7    0.912 -1.130         0.9000   368.0    63.800   3060
8      8    0.940 -1.110         0.9320   278.0    37.600   2620
9      9    0.962 -1.100         0.9570   217.0    23.200   2260
10    10    0.973 -1.090         0.9700   173.0    15.000   1970
11    12    0.976 -1.100         0.9780   116.0     6.800   1570
12    14    0.965 -1.130         0.9740    82.3     3.320   1310
13    16    0.962 -1.150         0.9790    60.7     1.720   1110
14    18    0.955 -1.170         0.9780    46.3     0.955    960
15    20    0.956 -1.190         0.9800    36.2     0.541    842

pearsonFallback="individual" :

softpower = 6
nethybrid = blockwiseModules(datExpr, power = softpower,maxBlockSize = 10000,
                                 TOMType = "signed", minModuleSize = 30,
                                 reassignThreshold = 0, mergeCutHeight = 0.25,
                                 numericLabels = TRUE, pamRespectsDendro = FALSE,
                                 saveTOMs = FALSE,networkType = "signed hybrid",
                                 verbose = 5,corType = "bicor", maxPOutliers = 0.1,
                                 pearsonFallback = "individual")
    table(moduleColors.hybrid)
    moduleColors.hybrid
            black          blue         brown          cyan     darkgreen 
              788          1804          1523           443           148 
         darkgrey    darkorange       darkred darkturquoise         green 
              123            83           198           136           995 
      greenyellow          grey        grey60     lightcyan    lightgreen 
              480         22270           226           309           212 
      lightyellow       magenta  midnightblue        orange paleturquoise 
              204           556           403            91            46 
             pink        purple           red     royalblue   saddlebrown 
              716           508           888           199            52 
           salmon       skyblue     steelblue           tan     turquoise 
              456            58            47           459         13063 
            white        yellow 
               78          1045

pearsonFallback="none" :

mad.datExpr =apply(datExpr,2,mad)
datExprMAD= datExpr[,mad.datExpr>0]
sft = pickSoftThreshold(datExprMAD,corFnc = "bicor", corOptions=list(maxPOutliers=0.1),networkType = "signed hybrid", powerVector = powers, verbose = 5)


   Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
1      1    0.503 -0.746          0.495 4120.00  3870.000   8560
2      2    0.681 -1.290          0.787 1610.00  1210.000   5010
3      3    0.750 -1.540          0.860  761.00   457.000   3270
4      4    0.778 -1.680          0.890  402.00   208.000   2270
5      5    0.788 -1.760          0.907  230.00   104.000   1650
6      6    0.803 -1.790          0.925  140.00    54.900   1240
7      7    0.820 -1.800          0.943   89.60    30.400    955
8      8    0.835 -1.790          0.956   59.50    17.500    750
9      9    0.842 -1.790          0.963   40.90    10.400    598
10    10    0.853 -1.770          0.971   28.80     6.320    484
11    12    0.870 -1.740          0.981   15.40     2.540    330
12    14    0.873 -1.720          0.977    8.83     1.110    236
13    16    0.879 -1.700          0.979    5.39     0.516    174
14    18    0.879 -1.670          0.973    3.45     0.249    132
15    20    0.894 -1.620          0.983    2.31     0.125    101


softpower =  6
  nethybrid.MAD = blockwiseModules(datExprMAD, power = softpower,maxBlockSize = 10000,
                               TOMType = "signed", minModuleSize = 30,
                               reassignThreshold = 0, mergeCutHeight = 0.25,
                               numericLabels = TRUE, pamRespectsDendro = FALSE,
                               saveTOMs = FALSE,networkType = "signed hybrid",
                               verbose = 5,corType = "bicor", maxPOutliers = 0.1,
                               pearsonFallback = "none")

table(moduleColors.hybrid.MAD)
moduleColors.hybrid.MAD
        bisque4           black            blue           brown          brown4 
             50             796            1161            1047              53 
           cyan       darkgreen        darkgrey     darkmagenta  darkolivegreen 
            451             232             221              99              99 
     darkorange     darkorange2         darkred   darkslateblue   darkturquoise 
            165              53             232              44             228 
    floralwhite           green     greenyellow            grey          grey60 
             64             970             630             887             334 
          ivory       lightcyan      lightcyan1      lightgreen lightsteelblue1 
             65             370              68             324              70 
    lightyellow         magenta   mediumpurple3    midnightblue          orange 
            312             694              72             426             166 
     orangered4   paleturquoise            pink           plum1           plum2 
             73             120             715              82              40 
         purple             red       royalblue     saddlebrown          salmon 
            665             838             291             140             479 
        sienna3         skyblue        skyblue3       steelblue             tan 
             87             148              85             122             495 
       thistle2       turquoise          violet           white          yellow 
             36            9697             117             153            1005 
    yellowgreen 
             87
RNA-Seq WGCNA R • 11k views
ADD COMMENT
1
Entering edit mode

You could try to set the deepSplit argument to 3 (default is 2).

ADD REPLY
0
Entering edit mode

Thanks a lot for your suggestion. Which version do you think I should prefer for pearsonFallback, "individual" or "none" ?

ADD REPLY
0
Entering edit mode

Eh, I have no clue :) I just know that deepSplit influences the size of clusters/number of genes per cluster...

ADD REPLY
0
Entering edit mode

Hi unfortunately it did not solve my problem. I changed it into 3 & 1, but both caused increase in number of genes in the "turquoise" module.

Do you have any idea what is the acceptable number of genes in 1 module ?

deepSplit=3

 moduleColors.hybrid.3
        bisque4           black            blue           brown          brown4 
             47             528            1720             837              51 
            green     greenyellow            grey          grey60           ivory 
            622             367           22077             286              64 
      lightcyan      lightcyan1      lightgreen lightsteelblue1     lightyellow 
            290              67             249              72             247 
        magenta   mediumpurple3    midnightblue          orange      orangered4 
            389              73             298             189              76 
        skyblue3       steelblue             tan       turquoise          violet 
             81             113             362           14168             112 
          white          yellow     yellowgreen 
            155             804              83 

 moduleColors.hybrid.MAD.3
          green     greenyellow            grey          grey60       honeydew1 
            860             630             885             253              40 
          ivory  lavenderblush3       lightcyan      lightcyan1      lightgreen 
             69              42             303              70             249 
      orangered4   paleturquoise  palevioletred3            pink           plum1 
             81              98              49             796              84 
          plum2          purple             red       royalblue     saddlebrown 
             57             706             847             208             121 
         salmon         salmon4         sienna3         skyblue        skyblue1 
            421              53              88             132              31 
       skyblue2        skyblue3       steelblue             tan        thistle1 
             33              85             119             483              55 
       thistle2       turquoise          violet           white          yellow 
             55           10033              95             134             953 


 **deepSplit=1**
   moduleColors.hybrid.1
        black          blue         brown          cyan     darkgreen 
          593          1205          1141           256           102 
     darkgrey    darkorange       darkred darkturquoise         green 
           96            61           104            97           890 
  greenyellow          grey        grey60     lightcyan    lightgreen 
          413         22939           222           245           155 
  lightyellow       magenta  midnightblue        orange          pink 
          153           511           251            92           563 
       purple           red     royalblue   saddlebrown        salmon 
          486           796           115            41           263 
      skyblue           tan     turquoise         white        yellow 
           47           302         15309            58          1101 

 moduleColors.hybrid.MAD.1
         black           blue          brown           cyan      darkgreen 
           780           1726           1561            349            233 
      darkgrey    darkmagenta darkolivegreen     darkorange        darkred 
           161             95            102            152            245 
 darkturquoise          green    greenyellow           grey         grey60 
           213            957            457           1060            266 
     lightcyan     lightgreen    lightyellow        magenta  mediumpurple3 
           280            255            255            526             46 
  midnightblue         orange     orangered4  paleturquoise           pink 
           316            161             55            122            613 
         plum1         purple            red      royalblue    saddlebrown 
            63            491            937            249            137 
        salmon        sienna3        skyblue       skyblue3      steelblue 
           351             93            142             73            130 
           tan      turquoise         violet          white         yellow 
           361          10465            111            149           1034 
   yellowgreen 
            86
ADD REPLY
2
Entering edit mode
8.2 years ago
Marge ▴ 320

Did you give a look at what is in the large module? Does it correlate with one of the two groups you mention? (maybe these info can give you some hints)

I experienced getting a very large module capturing a large fraction of the data points: in my case it was due to the nature of the data rather than to pre-processing or parameter selection (I did a lot of testing but results were remarkably stable). In particular the low number of samples I had and probably a fair amount of noise were giving a single module telling me essentially which tissue I was analyzing. At that time I concluded that I didn't have enough resolution to go beyond that (i.e. identify pathways and such). Attempts to split the large module via usage of alternative parameters gave me smaller clusters, but it turned out it was just an artificial way of splitting (the composition of the two subclusters was very similar and module memberships were compatible with assignment to either of the subclusters).

I obviously don't know if this is the case for you as well, just wanted to share my experience in case it helps.

ADD COMMENT
0
Entering edit mode

I also tried different combinations of Network types but none of them give me a proper clustering. I did not check what is in the large module but actually it is highly &significantly correlation by the single trait(disease vs control). Bu I feel like as there are ~9000 genes, it is normal to get significantly high resolutions. So do you think this is not good to go beyond ?

Do you have any idea what would be the ideal number/range of genes under 1 module ?

ADD REPLY
1
Entering edit mode

I don't think there is a "correct" number of genes: in an unsupervised analysis like this you just get what you get. Excluding any potential processing problem I would rather ask: can you understand whatever you wanted to understand by performing a WGCNA on your data with a very large module containing 9K genes? For example if you want to get information at pathway level (typically what you would like to find performing a WGCNA) such a large module is probably not very informative.

Telling whether it's the case of trying further or not is beyond what I can tell without having the data in my hands :-)

ADD REPLY
0
Entering edit mode

Sorry if unrelevant;

I am working with a simple lap top then no possible to run WGCNA on a big matix then I am using only differentially expressed genes instead of 20000 genes. do you think this is not allowed in WGCNA??? because I read something about bias in using DE genes

ADD REPLY
1
Entering edit mode

You shouldn't be using DE genes as input, it will be bias for sure. You have to analyze them in blocks, try to optimize the "maxBlockSize" according to your laptops power to handle.

ADD REPLY
0
Entering edit mode

Any update on this?

I have a similar situation with my analysis of ~16000 genes. 10000 genes gets clustered into the "turquoise" module and it is significantly negatively correlated to sampleType (TypeA vs Type B; denoted as 0 or 1 in module trait analysis).

Can these be due to the number of genes differentially expressed? ~12k genes were differentially expressed (at lFC>0, p-adj<0.1). ~4K genes DE (at lFC>1, p-adj<0.05)

ADD REPLY

Login before adding your answer.

Traffic: 2789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6