What is the meaning of "weight" in the WGCNA Cytoscape edge output file?
2
3
Entering edit mode
7.0 years ago
elenajmichel ▴ 90

This is probably a simple question and my apologies if that is the case. I have a question regarding the meaning of the "weight" column in the Cytoscape Output Edge File after using WGCNA (this may apply to other output files, but I do not know if the name is the same). Obviously I understand what the weight is, but in this case is the weight the p-value? So a lower number is more significant? Or does a higher weight mean two nodes are more related?

I've looked everywhere but can't find an answer, so apologies if I missed something obvious. Thank you very much in advance!

WGCNA edge weight weight • 11k views
ADD COMMENT
0
Entering edit mode

How have you gone from WGCNA to Cytoscape?, i.e., how did you input your data into Cytoscape?

What is the range of the weights that you're seeing?

ADD REPLY
0
Entering edit mode

Hi Kevin,

I used the Cytoscape export function in WGCNA (the function is exportNetworkToCytoscape) to create an edge file, so I get a file with headers as such: fromNode toNode weight direction fromAltName toAltName. Then in Cytoscape I load that file as a network and select fromNode as the Source Node; toNode as the Target Node; and weight as the Edge Attribute.

The range of weights that I see is from about 0.001 up to 0.5, which leads me believe it is a p-value and I should look for the smaller number to get the best correlation, but I want to be sure! Especially because I like to filter my coexpressors for my bait genes so that I don't have a hair ball when I look at the network (e.g. take the top 30 coexpressors for each bait gene). Therefore it is very important that I am selecting the coexpressors that are most highly correlated, and not the opposite.

ADD REPLY
1
Entering edit mode

Hi! I'm not so sure that it's P value. I believe it is just the edge weight from the adjacency matrix. If you look at the default parameters for the exportNetworkToCytoscape function, it says the following for the parameter threshold:

adjacency threshold for including edges in the output.

The default value for this just so happens to be 0.5, which is the max value that you see for edge weights in Cytoscape.

What do you think?

ADD REPLY
0
Entering edit mode

Hi Kevin,

Yes, that makes sense! With this understanding, I would take values closest to 0.5 to be the "best" edges. Would you agree?

ADD REPLY
0
Entering edit mode

Hi! Yes, that is my interpretation of it. I would just modify the threshold parameter to something like 0.1 and then 0.9 just to be sure! It is not entirely clear from the WGCNA manual.

ADD REPLY
0
Entering edit mode

Hi Kevin,

That's a great idea! Thanks so much for your discussion.

ADD REPLY
0
Entering edit mode

Okay, it was no problem. Best of luck with the work!

ADD REPLY
5
Entering edit mode
6.0 years ago
Renesh ★ 2.2k

In Cytoscape Output Edge, the weight columns refer to the connection strength between two nodes (genes) in terms of the correlation value obtained from TOMsimilarity function. Topological Overlap Matrix (TOM) which obtained from TOMsimilarity will have these values for respective genes. This value should be always between 0 and 1. The higher value refers to a strong connection or co-expression of genes.

ADD COMMENT
0
Entering edit mode

How to select an appropriate threshold for exporting network? Is it arbitrary and just to limit the number of connections to visualise or does it have any meaning?

ADD REPLY
0
Entering edit mode
2.7 years ago

I have a question. In help file, exportNetworkToCytoscape( adjMat, edgeFile = NULL, nodeFile = NULL, weighted = TRUE, threshold = 0.5, nodeNames = NULL, altNodeNames = NULL, nodeAttr = NULL, includeColNames = TRUE) adjMat: adjacency matrix giving connection strengths among the nodes in the network. But why in tutorial, always input TOM file. Could you explain it? Thanks

ADD COMMENT
1
Entering edit mode

The network in WGCNA is the TOM.

It is explained here

ADD REPLY
0
Entering edit mode

Thanks. I find the max value of TOM in my analysis is about 0.57, not close to 1. Does it mean my data is not co-expressed well? And I look at the codes of function chooseTopHubInEachModule(). It uses row sum of adjacency matrix to choose hub genes. Also, there is a function intramodularConnectivity(). It uses row sum of adjacency matrix. Why do they both use adjMat not TOM? Thanks

ADD REPLY
0
Entering edit mode

I find the max value of TOM in my analysis is about 0.57

Keep always in mind that TOM is essentially an adjacency matrix of correlation values raided to a power β. If β = 6 then the actual maximum correlation value in your TOM is about 0.91

Does it mean my data is not co-expressed well?

In the contest of WGCNA a single correlation value does not tell you anything about co-expression.

And I look at the codes of function chooseTopHubInEachModule(). It uses row sum of adjacency matrix to choose hub genes. Also, there is a function intramodularConnectivity().

When you calculate the TOM you will get rid of spurious or missing connections in the adjacency matrix (read Topological overlap matrix in the Methods section). If you read the paper, you can see that the TOM improves the clustering. Look at the differences in Fig 4 between GTOM0 (adjacency) and GTOM1. The properties of the single nodes (e.g. connectivity) are instead calculated from the adjacency matrix.

ADD REPLY
0
Entering edit mode

Thank you. I have raised my TOM to a power=6 in signed network. And my max value of TOM is about 0.57. But if I want to know which nodes are highly connected in a module and show them in cytoscape, the inpute should be TOM or adjacency? Thanks

ADD REPLY
0
Entering edit mode

the input is the TOM. Check the tutorial: Exporting to Cytoscape

ADD REPLY
0
Entering edit mode

i think that exportNetworkToCytoscape is a general function that could work on a lot of matrices. If the input is TOM, the weight value is TOM value. If the input is adjacency, the weight value is adjacency value. My question is why when considering single nodes, we use adjacnecy not TOM. So if I say which pair is highly connected, we should use TOM or adjacency? Thanks

ADD REPLY
0
Entering edit mode

for example, if I set a threshold to define if two genes are linked in a module, do you mean I need to use TOM value? Thanks

ADD REPLY
0
Entering edit mode

so I think if we use TOM to define two probes are linked or not and them in network, it maybe lose hub genes. Because hub genes hold the network together but maybe the connection strength is not strong. So which way is better to show the network? Thanks

ADD REPLY
0
Entering edit mode

Because hub genes hold the network together but maybe the connection strength is not strong

this is mostly true in real 'scale-free networks", which is not the case for transcriptional networks.

Use the TOM as input for exportNetworkToCytoscape, and choose a threshold of 0.1. You can also use the adjacency matrix, it does not matter, but you will probably see nodes with only one connection. Then check in cytoscape if the node in with the highest intra-modular connectivity are still hubs or not.

ADD REPLY
0
Entering edit mode

Thank you. And I have another question: I find there are two ways to get TOM. One is net <- blockwiseModules(), then TOM=as.matrix(load(net$TOMFiles). Another is TOM=TOMsimilarity(adjMat). In the former one, the diagonal is 0. In the latter one, the diagonal is 1. But other value are identical. Now I choose the former one as my TOM. But I think these two should be the same. Thanks

ADD REPLY
0
Entering edit mode

I know we don't need the value on the diagonal, so these two are the same

ADD REPLY
0
Entering edit mode

use the TOMsimilarity.

ADD REPLY
0
Entering edit mode

I'm asking the question about input file is TOM or adjMat is because my network in these two ways are very different. Based on TOM, the highest weight is always the same gene with other genes (one gene in the middle of the network). Based on adjMat, the highest weight is different pairs of genes. I just think the plot of network in the latter one looks better. It shows each one can be linked to many genes and they have many interactions each other. And if I use TOM and set the threshold too high, the hub genes will be lost in the network. If I set the threshold too low, the size of network will be too big. So I don't know which way is better. Do you mean in transcriptional network, hub gene is more important (top row sum of adjMat) than top TOM value of gene pairs? In other words, how about we show hub genes and genes with top TOM value with hub genes? which is more biological meaningful? Thanks

ADD REPLY
0
Entering edit mode

As I said before, the network is always the TOM. Clusters are detected by using the TOM and therefore, the TOM is the one that should be visualized in cytoscape. This is what the WGCNA authors show in the tutorial. If the threshold is too high, then use lower values.

Do you mean in transcriptional network, hub gene is more important (top row sum of adjMat) than top TOM value of gene pairs?

In any 'scale-free' network hub genes are more important than single gene pairs.

In other words, how about we show hub genes and genes with top TOM value with hub genes? which is more biological meaningful?

Biologically speaking nothing, but the approach make sense if you want to show the core of the network/module.

ADD REPLY

Login before adding your answer.

Traffic: 2796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6