Question

How to avoid duplicate edges between same node pair in a co-expression network

0

Entering edit mode

7.0 years ago

aishu.jp ▴ 10

I have a control vs treated RNA-seq plant data for which I am trying to construct gene co-expression network.

The normalized count matrix of 6000 DEGs idenftifed are derived after rlog transformation was inputted to Cor() function and Pearson correlation was applied. The pair wise correlation analysis gave ~30 million gene pairs out of which, gene pairs were selected with a cutoff >=0.95

In the input file for cytoscape, I notice that there are repetition of node pairs connected through same correlation value. For example

geneA geneB 0.967634

geneB geneC 0.976453

geneB geneA 0.967634

Is it possible to get node connection like this with same correlation value?

If it so should i consider only one pair (geneA geneB 0.967634) and discard the other (geneB geneA 0.967634)?

I'm using Cytoscape for visualizing this gene co-expression network but it is huge. Specifically, it has 11934 nodes and 518282 edges

By default cytoscape is opening the network in "prefuse force directed layout",

Is prefuse force directed layout a directed graph?

Why cytoscape load the network in such a layout?

is there a better way for visualizing my network in Cytoscape?

Cytoscape RNA-Seq duplicate edges coexpression • 3.9k views

ADD COMMENT • link updated 7.0 years ago by Kevin Blighe 89k • written 7.0 years ago by aishu.jp ▴ 10

score 2 · Answer 1 · 2018-07-14

Is it possible to get node connection like this with same correlation value?

Yes, if you have one or more of these situations:

low sample numbers
genes of constant variance

If it so should i consider only one pair (geneA geneB 0.967634) and discard the other (geneB geneA 0.967634)?

Yes, but this will be just regarded as a self-loop and can be removed later on. Cytoscape should also have a function to only input the upper- or lower- triangle of the correlation matrix.

See here: Tutorial:Introduction to Cytoscape 3 (search for 'Options for removing duplicated edges and self-loops')

I'm using Cytoscape for visualizing this gene co-expression network but it is huge. Specifically, it has 11934 nodes and 518282 edges

With a network that large, what are you actually aiming to portray? It will likely mean nothing, biologically, but may simply provide for a nice graphic / figure.

By default cytoscape is opening the network in "prefuse force directed layout",

Is prefuse force directed layout a directed graph?

I do not believe it is a typical direct graph. The confusion may lie in the name of the algorithm and the unfortunate use of 'directed'. Read more about it here: Prefuse Force Directed Layout

Note that you can add arrows, if relevant, from the Styles: Visualizing Data with Styles.

Why cytoscape load the network in such a layout?

It is just the default in your Cytoscape version. You can likely modify this.

is there a better way for visualizing my network in Cytoscape?

See all layouts here: 11.6. Automatic Layout Algorithms.

Generally, it's advisable to go through the entire tutorial on GitHub: cytoscape/cytoscape-tutorials

Kevin