I have a control vs treated RNA-seq plant data for which I am trying to construct gene coexpression network.I identifed a total of 6000 genes are significantly differential expressed genes using DESeq2 R package after applying FDR cutoff 0.05.
The normalised count matrix of these 6000 genes derived after rlog transformation was inputed to Cor() function and Pearson correlation was applied. The pair wise correlation analysis gave ~30 million gene pairs out of which, 1380285 gene pairs were selected with a cutoff >0.95 and were visualized using cytoscape
While visualising the network in cytoscape. I observed self loop for all genes in the network.
- Is the presence of self loop for all genes is biologically correct or not.
- If it's not correct, how to avoid self loops in all genes in the network and retain only the biologically significant one's
Thank you sir for your help.
Do any clustering techniques reduce the gene pairs and self looping
Clustering algorithms will just cluster whatever data you provide. To remove self-loops, you can use the NetworkAnalyzer plugin for Cytoscape or just remove them in your correlation matrix after you generate it.
For example, you could set all perfect correlations to NA or some low value, such that they will be filtered:
Maybe diag is safer?
Other useful functions:
lower.tri
,upper.tri
Good point, zx8754