TOM matrix generated by WGCNA package in R
1
0
Entering edit mode
2.9 years ago
Dude • 0

Hi! Everyone, I have a problem when running the WGCNA code in R. The TOM matrix yielded by the function "TOMsimilarityFromExpr " is filled with NA value. Why did this happen?? I would appretiate it if there is anyone could help me with this! Thank you!! 🙏 the code and results are as follows:

>  dissTOM = 1-TOMsimilarityFromExpr(datExpr, power = 8); 
> dissTOM[1:6,1:6]
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0   NA   NA   NA   NA   NA
[2,]   NA    0   NA   NA   NA   NA
[3,]   NA   NA    0   NA   NA   NA
[4,]   NA   NA   NA    0   NA   NA
[5,]   NA   NA   NA   NA    0   NA
[6,]   NA   NA   NA   NA   NA    0

> TOM = TOMsimilarityFromExpr(datExpr, power = 8)
> TOM[1:6,1:6]
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1   NA   NA   NA   NA   NA
[2,]   NA    1   NA   NA   NA   NA
[3,]   NA   NA    1   NA   NA   NA
[4,]   NA   NA   NA    1   NA   NA
[5,]   NA   NA   NA   NA    1   NA
[6,]   NA   NA   NA   NA   NA    1
WGCNA • 2.8k views
ADD COMMENT
0
Entering edit mode

what is the output of dim(datExpr)?

ADD REPLY
0
Entering edit mode
It's a gene expression set. 

> class(datExpr)
[1] "matrix" "array" 
> dim(datExpr)
[1]    95 19206

> datExpr[1:4,1:4]
             ENSG00000000003 ENSG00000000005 ENSG00000000419 ENSG00000000457
TCGA-2Y-A9H4        39.95787     0.000000000        27.44848        1.323982
TCGA-5C-AAPD        26.10105     0.000000000        24.81812        1.587723
TCGA-BC-A10W        13.57470     0.009521119        27.08928        7.692339
TCGA-BW-A5NP        28.06425     0.022192741        18.36840        4.594120
ADD REPLY
0
Entering edit mode

does this happen also with the functions adjacency?

Without the datxExpr is difficult to understand what is going on. Would you mind to share the matrix? You can change the name of samples

ADD REPLY
0
Entering edit mode

Hello! I just tried the function "adjacency". And it seems that it takes a much longer time than it did without adjacency(like hours). Still not so sure if it's gonna work.

>   adja <- adjacency(datExpr,power = 8)
>   dissTOM = 1-TOMsimilarityFromExpr(adja, power = 8); 
TOM calculation: adjacency..
..will not use multithreading.

The datExpr file is included in the link, thank you for your time and attention.

https://github.com/Datapioneer/QUESTION/tree/main

ADD REPLY
1
Entering edit mode

thanks for the file. TOMsimilarityFromExpr doesn't use the adjacency matrix as input. Use TOMsimilarity

TOM = TOMsimilarity(adjacency)
ADD REPLY
0
Entering edit mode

Hello guys, I also met the same problems and still have not found the solution...

> datExpr[1:6,1:6]
     host_Asol1 host_Asol100 host_Asol1000
D1_1            3.918257              3.599864               2.425882
D1_2            4.300090              3.145677               1.596396
D1_3            2.830688              4.001060               2.830688
D1_4            3.670657              3.850542               2.575045
D1_5            3.741992              4.152538               2.185101
D2_1            3.240862              3.922310               1.174550
     host_Asol10000 host_Asol10000 host_Asol10001
D1_1                 2.207705              -1.1080985               1.9495859
D1_2                 1.596396              -1.1080985               0.5921175
D1_3                 1.820121              -1.1080985               1.8201213
D1_4                 2.089285              -1.1080985               1.3402078
D1_5                 2.402622               0.6037009               0.6037009
D2_1                 2.150964              -1.1080985               0.5814653
> TOM[1:6;1:6]
Error: unexpected ';' in "TOM[1:6;"
> TOM[1:6,1:6]
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1   NA   NA   NA   NA   NA
[2,]   NA    1   NA   NA   NA   NA
[3,]   NA   NA    1   NA   NA   NA
[4,]   NA   NA   NA    1   NA   NA
[5,]   NA   NA   NA   NA    1   NA
[6,]   NA   NA   NA   NA   NA    1

> gsg = goodSamplesGenes(datExpr0, verbose = 5);
 Flagging genes and samples with too many missing values...
  ..step 1
> gsg$allOK
[1] TRUE
ADD REPLY
0
Entering edit mode

I thought that maybe the reasons:

1: executing %dopar% sequentially: no parallel backend registered
2: In eval(xpr, envir = envir) : Some correlations are NA in block 1 : 1483 .
...
21: In eval(xpr, envir = envir) :
  Some correlations are NA in block 28178 : 29660 .
22: In eval(xpr, envir = envir) :
  Some correlations are NA in block 29661 : 30152 .
ADD REPLY
0
Entering edit mode

Found the solution:

   #if TOM all NA, need to check and remove those genes whose **variance <0**


variancedatExpr = as.vector(apply(as.matrix(datExpr),2,var,na.rm=T)) 
keepGenes = variancedatExpr > 0
table(keepGenes)
datExpr = datExpr[,keepGenes]
ADD REPLY
2
Entering edit mode
2.9 years ago

Apparently the NaN in TOM are introduced because you have 373 genes with too many zero:

datExpr0 <- read_csv("D:/Download/datExpr.csv")
datExpr0<-data.frame(datExpr0, row.names = 1)

gsg = goodSamplesGenes(datExpr0, verbose = 3);
# Flagging genes and samples with too many missing values...
#  ..step 1
#  ..Excluding 373 genes from the calculation due to too many missing samples or zero variance.
#  ..step 2
gsg$allOK
# [1] FALSE

Remove offending genes

if (!gsg$allOK)
{
    # Optionally, print the gene and sample names that were removed:
    if (sum(!gsg$goodGenes)>0)
        printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", ")));
    if (sum(!gsg$goodSamples)>0)
        printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));
    # Remove the offending genes and samples from the data:
    datExpr = datExpr0[gsg$goodSamples, gsg$goodGenes]
}

Calculate TOM

TOM = TOMsimilarityFromExpr(datExpr, power = 8)
ADD COMMENT
0
Entering edit mode

this is a follow up to the NaN in TOM.

If you are working with a WGCNA version prior to 1.62 (see the change log), the NaN are introduced during the TOM calculation because of completely unconnected nodes. By removing genes with too many zero across your samples, things get slightly better. In conclusion the NaN in TOM seems to be a feature of your expression matrix.

ADD REPLY

Login before adding your answer.

Traffic: 2754 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6