Question

Heatmap In R-Wgcna-->Error While Plotting Tom

3

Entering edit mode

11.4 years ago

pixie@bioinfo ★ 1.5k

I am performing automatic network construction using WGCNA, exactly as given in the tutorial. I could retrieve all the modules. However, I am getting an error when I am plotting the TOM. When I do for a random subset of genes, I am able to generate a heatmap. Is it a memory problem ? Kindly help me out with this.. Thank You

> dissTOM = 1-TOMsimilarityFromExpr(datExpr, power = 6);
>Rough guide to maximum array size: about 46000 x 46000 array of doubles..
>TOM calculation: adjacency..
..will use 32 parallel threads.
Fraction of slow calculations: 0.000000
..connectivity..
..matrix multiply..
..normalize..
..done.

> plotTOM = dissTOM^7;
> diag(plotTOM) = NA;
> sizeGrWindow(9,9)
> TOMplot(plotTOM, geneTree, moduleColors, main = "Network heatmap plot, all genes"
+ )

Error in .heatmap(as.matrix(dissim), Rowv = as.dendrogram(dendro, hang = 0.1),  :
row dendrogram ordering gave index of wrong length

r heatmap • 7.1k views

ADD COMMENT • link updated 2.9 years ago by camilo • 0 • written 11.4 years ago by pixie@bioinfo ★ 1.5k

1

Entering edit mode

Were you able to solve this?

ADD REPLY • link 8.2 years ago by Dataminer ★ 2.8k

1

Entering edit mode

I had the same error, maybe what worked for me will work for you. I suspected it may be due to input size and memory issues, and reduced my input from 35 observations of ~8000 genes to 15 obs of about 2500 just to test it out. The smaller data set plotted perfectly just like in the tutorial. I have no idea why this would produce this particular error though.

ADD REPLY • link 8.1 years ago by RossCampbell ▴ 140

0

Entering edit mode

Hi,

Indeed You are right. I don't know, why the WGCNA guys haven't fixed this problem.

Thank you

ADD REPLY • link 8.1 years ago by Dataminer ★ 2.8k

0

Entering edit mode

@Ross Campbell: But how do you reduce the list?

ADD REPLY • link 8.1 years ago by Dataminer ★ 2.8k

0

Entering edit mode

I just opened it in Excel and chopped off a bunch of columns and about half the rows. Obviously that messes up the data, I just did that to test the idea. I'm going to try running the complete data set on a high-performance cluster hopefully later today.

ADD REPLY • link 8.1 years ago by RossCampbell ▴ 140

0

Entering edit mode

Hi,

Isn't there any better method like selecting the rows with most significant differences or something like that?

ADD REPLY • link 8.1 years ago by Dataminer ★ 2.8k

2

Entering edit mode

The best method would be to keep all the data I believe. WGCNA is designed to be an unsupervised process, so selecting rows with threshold differences would potentially skew the clustering process. It would be better to keep all the data intact and run the full data set on a higher power computer. I just trimmed mine for troubleshooting the error and debugging code on my local machine. After that I ran the full data set on a server and it worked fine.

ADD REPLY • link 8.1 years ago by RossCampbell ▴ 140

score 1 · Answer 1 · 2021-12-17

1

Entering edit mode

3.0 years ago

zhenminx ▴ 10

My problem almost likes this author in the step of TOM，but I used a seize of codes of one step net of WGCNA，the value set with ”maxBlockSize = nGenes“ , the codes and error are as follows：

dissTOM = 1-TOMsimilarityFromExpr(dataExpr, power = 18) TOMplot(plotTOM, geneTree, moduleColors, main = "Network heatmap plot, all genes") Error in x[, iy] : subscript out of bounds

and 22662genes and 43samples of my data used WGCNA

ADD COMMENT • link 3.0 years ago by zhenminx ▴ 10

0

Entering edit mode

Hi! zhenminx, How could you solve that error that marks you, I got the same one.

ADD REPLY • link 2.9 years ago by camilo • 0

score 0 · Answer 2 · 2017-11-30

Hello,

I also met this error when I doing this process.
After searching for the solution of this problem, I found someone post a possible solution of this error.
It told that the major problem of this error may come from the length difference of this three variables( plotTOM, geneTree and moduleColors.) (geneTree is less than the other two)

Back to the source of geneTree:

geneTree = net$dendrograms[[1]]

net = blockwiseModules(datExpr, power = 6, TOMType = "unsigned", minModuleSize = 30, reassignThreshold = 0, mergeCutHeight = 0.25, numericLabels = TRUE, pamRespectsDendro = FALSE, saveTOMs = TRUE, saveTOMFileBase = "femaleMouseTOM", verbose = 3)

The main reason of it comes from the default set of maxBlockSize in blockwisModules function is 5000.
When our data size is over than the default size, this function may automatically divide your data into several parts.

(maxBlockSize: integer giving maximum block size for module detection. Ignored if blocks above is non-NULL. Otherwise, if the number of genes in datExpr exceeds maxBlockSize, genes will be pre-clustered into blocks whose size should not exceed maxBlockSize. https://www.rdocumentation.org/packages/WGCNA/versions/1.61/topics/blockwiseModules)

The possible solution that it suggested is to change the maxBlockSize value over the value of our own data.

Hope this will help.

ref site: http://www.biotrainee.com/thread-205-1-1.html

score 0 · Answer 3 · 2017-12-01

In addition, I just found that there is already some caution related to this problem in the WGCNA tutorials. It's surely the problem related to input size and memory issues as RossCampbell said above.

Here is the content in the tutorials:

"A word of caution
for the readers who would like to adapt this code for their own data. The function blockwiseModules has many parameters, and in this example most of them are left at their default value. We have attempted to provide reasonable default values, but they may not be appropriate for the particular data set the reader wishes to analyze. We encourage the user to read the help ﬁle provided within the package in the R envi- ronment and experiment with tweaking the network construction and module detection parameters. The potential reward is, of course, better (biologically more relevant) results of the analysis.

A second word of caution concerning block size.
In particular, the parameter maxBlockSize tells the function how large the largest block can be that the reader’s computer can handle. The default value is 5000 which is appropriate for most modern desktops. Note that if this code were to be used to analyze a data set with more than 5000 probes, the function blockwiseModules will split the data set into several blocks. This will break some of the plotting code below, that is executing the code will lead to errors. Readers wishing to analyze larger data sets need to do one of the following: • If the reader has access to a large workstation with more than 4 GB of memory, the parameter maxBlockSize can be increased. A 16GB workstation should handle up to 20000 probes; a 32GB workstation should handle perhaps 30000. A 4GB standard desktop or a laptop may handle up to 8000-10000 probes, depending on operating system and ihow much memory is in use by other running programs. • If a computer with large-enough memory is not available, the reader should follow Section 2.c, Dealing with large datasets, and adapt the code presented there for their needs. In general it is preferable to analyze a data set in one block if possible, although in Section 2.c we present a comparison of block-wise and single-block analysis that indicates that the results are very similar."

Ref: Tutorial for the WGCNA package for R: I. Network analysis of liver expression data in female mice 2.a Automatic network construction and module detection Peter Langfelder and Steve Horvath November 25, 2014