Question

Regarding finding hub genes using WGCNA

2

Entering edit mode

3.2 years ago

seta ★ 1.9k

Dear all,

I have got the gene expression microarray dataset (about 17000 genes) of about 400 cancer samples with different cancer subtypes. I considered subtypes as traits (binary traits) and used WGCNA to find the possible modules associated with traits and to identify hub genes. I used 50% genes with the highest variance as input for WGCNA and selected signed network type. Could you please help me out with some issues?

As the green module was one of the associated modules with one of the cancer subtypes (trait), I try to find hub genes in this module via the criteria of GS > 0.2 & MM. green > 0.8. It returned to me 6 genes, however, when I further checked them, I found that two genes belonged to another module, not the green module. The same thing happened when I found hub genes in another associated module. Could you please tell me why it has happened? What’s wrong?

Regarding modules with a negative association with the binary trait, how we should interpret them especially in terms of gene expression at those modules? Here, using the signed or unsigned network is important for interpretation?

Thanks in advance

gene-expression hub-genes WGCNA • 4.5k views

ADD COMMENT • link updated 3.2 years ago by andres.firrincieli 3.8k • written 3.2 years ago by seta ★ 1.9k

score 1 · Answer 1 · 2021-08-27

1

Entering edit mode

3.2 years ago

huynguyen96.dnu ▴ 50

Hi, I have just published a paper using WGCNA where I improved WGCNA a little bit. You can see here. I do believe that as you finished reading my paper, you can self-answer your two questions well. Besides, R codes using the improved version of WGCNA were pushed to our Github (https://github.com/huynguyen250896/drivergene). However, should you still have any concern, do not hesitate to drop your question here.

ADD COMMENT • link 3.2 years ago by huynguyen96.dnu ▴ 50

0

Entering edit mode

Thank you for your response. I took a look at your paper, I'll try it. However, now, I didn't catch my answers. It's my first time using WGCNA, could you please let me know specifically the answers of the question?

ADD REPLY • link 3.2 years ago by seta ★ 1.9k

score 1 · Answer 2 · 2021-08-28

As the green module was one of the associated modules with one of the cancer subtypes (trait), I try to find hub genes in this module via the criteria of GS > 0.2 & MM. green > 0.8. It returned to me 6 genes, however, when I further checked them, I found that two genes belonged to another module, not the green module. The same thing happened when I found hub genes in another associated module. Could you please tell me why it has happened? What’s wrong?

This is not possible so I guess you did something wrong during the selection. The output of the signedKME function should look like the data.frame below.

For example, the gene00011 belong to the blue module (MM = 0.94) but still has a MM for the cyan module of 0.82.

datKME_WT=signedKME(multiExpr$WT$data, mergedMEs_WT, outputColumnName="MM.")
names(datExpr) -> names(mergedColors_WT)
matrix_mergedColors_WT <-as.matrix(mergedColors_WT)
datKME_moduleColor_WT<-merge(matrix_mergedColors_WT, datKME_WT, by =  "row.names")
head(datKME_moduleColor_WT)

       Row.names           V1     MM.cyan MM.lightcyan MM.lightgreen   MM.salmon  MM.blue     MM.greenyellow  MM.black
    1  gene00001   lightgreen  0.18921980  0.034292334    0.63531697  0.13602407  0.305031489    0.099648423 -0.07497855
    2  gene00002         blue  0.64468087  0.334272733   -0.03860808  0.63641185  0.753479431    0.574225848 -0.18239362
    3  gene00003         blue  0.07705021 -0.322657990    0.03485713  0.31640875  0.391720482    0.517038670  0.27108623
    4  gene00004         blue  0.58244822  0.032159210    0.36066706  0.68577835  0.854280546    0.671092702 -0.12259797
    5  gene00005  greenyellow  0.15336767 -0.332123183    0.11236861  0.70933620  0.687739541    0.881175376  0.39838955
    6  gene00006    turquoise  0.09267411  0.493756319    0.06919422 -0.32990762 -0.396841629   -0.731454202 -0.54741366
    7  gene00007    turquoise  0.16573510  0.500944581    0.06799996 -0.53739752 -0.412083015   -0.811240414 -0.64712763
    8  gene00008    turquoise  0.29941876  0.489617894    0.16386992 -0.13675715 -0.082734470   -0.476539603 -0.57997252
    9  gene00009  greenyellow  0.18869704 -0.163737021    0.04222775  0.76679584  0.674572313    0.852283904  0.38104120
    10 gene00010         blue  0.53615218  0.058295002    0.55403638  0.72056126  0.866062467    0.663952993 -0.07039581
    11 gene00011         blue  0.82303113  0.288671439    0.48615401  0.60648940  0.938877068    0.544199035 -0.38204120
    12 gene00012         blue  0.77350270  0.185758239    0.40904636  0.64260957  0.978811408    0.672022879 -0.27076743
    13 gene00013         blue  0.79738706  0.260010275    0.37892031  0.65266143  0.967657338    0.645249951 -0.30468609
    14 gene00014         blue  0.75616479  0.179855221    0.39774042  0.66900697  0.981235158    0.699922205 -0.24122598
    15 gene00015         blue  0.81332463  0.260093653    0.37202946  0.63195789  0.959731945    0.625734130 -0.33064164

Let's say that cyan is the module of interest, and you are looking for genes in this module with a MM > of 0.8. If you do a subsetting of the MM data.frame only based on MM values in MM.cyan:

Cyan_08<-subset(datKME_moduleColor_WT, datKME_moduleColor_WT[,"MM.cyan"] > 0.8)
head(Cyan_08)
        Row.names   V1   MM.cyan MM.lightcyan MM.lightgreen MM.salmon   MM.blue MM.greenyellow   MM.black MM.lightyellow
11 gene00011 blue 0.8230311    0.2886714     0.4861540 0.6064894 0.9388771    0.544199035 -0.3820412  -0.4310391298
15 gene00015 blue 0.8133246    0.2600937     0.3720295 0.6319579 0.9597319    0.625734130 -0.3306416  -0.3246696645
26 gene00028 blue 0.8065564    0.2741694     0.5267171 0.4999075 0.8731536    0.443267201 -0.4634928  -0.3310028561
28 gene00030 cyan 0.8477074    0.3489527     0.3331260 0.1508366 0.6455064    0.127876728 -0.6717292   0.0902075371
60 gene00065 cyan 0.9444121    0.5703349     0.4319337 0.1575432 0.6425243   -0.004577362 -0.8415948   0.0008084272
61 gene00066 cyan 0.9018271    0.4143577     0.6287749 0.1240565 0.6861586    0.042573472 -0.7723868  -0.0902572846

genes of the blue module will be also included in Cyan_08

Regarding modules with a negative association with the binary trait, how we should interpret them especially in terms of gene expression at those modules? Here, using the signed or unsigned network is important for interpretation?

If you have a signed network, the negative association for a binary trait (subtypeA ) means that the expression genes contributing to the 1st PC (module eigengene) of a given module is lower in the subtypeA samples. I always build 'signed' netwroks because are much easier to understand