Question

How to choose the threshold of co-expression for gene expression networks

1

Entering edit mode

6.8 years ago

elb ▴ 260

Hi guys, I have a question regarding the co-expression networks. In particular I have a gene marker and a list of co-expressed genes based on the mutual information. This list is around of 1000 genes (neighbourhood). Of course this is a not manageable number of genes. Is there a way to choose the best or representative number of genes-neighbors according to a threshold of MI value for example? I tried to rank the genes from the highly correlating to the lowly correlating but at some point I have to stop and choose a final number of genes. Is there a way to choose a cut-off point that could be "robust". I have no idea because I know that it depends on the final goal but in my case no experiments are feasible with this huge number of genes.

Could you help me please?

Thanks in advance

gene expression networks threshold • 3.2k views

ADD COMMENT • link updated 6.8 years ago by sandeep.amberkar18 ▴ 50 • written 6.8 years ago by elb ▴ 260

1

Entering edit mode

You imply that follow-up experiments are the limiting step so why not rank the genes in a way that's relevant to the experiments/the question to address and take the top n with n being what is suitable for follow-up. Also you can use the old elbow rule trick: plot the relevant values in decreasing order and find if there's an elbow. In many real-life data, there is a sharp initial decrease followed by a flat part. The point, not always well defined, at which the curve flattens is usually a good practical cut-off point but that may still give you too many candidates to follow up.

ADD REPLY • link 6.8 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thank you very much for for answer. The problem is always the same...there's not a clear question and to make inference was asked...

ADD REPLY • link 6.8 years ago by elb ▴ 260

1

Entering edit mode

If you have the input expression dataset that was used to compute MI, you may want to consider doing a randomization test to estimate the type I error rate (false positive rate) as a function of MI threshold. You'd recompute MI in randomly re-assorted input data sets, to determine what the false positive rate is at a given MI or correlation cutoff under the null hypothesis of no associations among expression profiles. You would then pick a cutoff that has a low enough false positive rate to satisfy your application. In large data sets, high MI values will occur by chance, and as data set size increases the false positive rate at any given fixed cutoff MI value can become larger. This approach says nothing about biological significance, but would control your false positive rate.

ADD REPLY • link 6.8 years ago by Ahill ★ 2.0k

0

Entering edit mode

Thank you Ahill. Finally I performed the randomization that seems to ben the only one satisfying criteria to choose a threshold that at the end is a compromise between false positive and false negative findings.

ADD REPLY • link 6.8 years ago by elb ▴ 260

score 0 · Answer 1 · 2018-02-27

Broadly what is it that you wish to achieve?

There is no explanation for choosing a cutoff, be it of correlation or p-value. However for Spearman correlation there have been widely accepted ranges

Low correlation = 0.2 ~ 0.4
Med correlation = 0.4 ~ 0.6
High correlation = 0.7 ~ 0.9

Perhaps you should look into - WGCNA which is popular to determine coexpression modules from transcriptomic data. Which brings back to the original question - what do you want to achieve at the end of it? If you could be more specific, I could help you more.