Question

RNA co-expression: shall I use differential co-expression or not?

3

Entering edit mode

7.4 years ago

sandrine.muller.research ▴ 30

Hi,

I am new in bioinformatics and have a background in neuroimaging where we often use a baseline to build our models. Meaning, all the relationships between variables (correlation or other measurements) are inferred from the difference of activity between or condition of interest and in the baseline. Although these types of differential models are gold standard in my field, I've heard that studies of differential co-expression in RNA-seq is controversed. Does anyone can explain me why (difficulty of choice of a baseline...etc) and/or point me to publications that discuss the topic?

Thank you very much!

Sandrine

RNA-Seq co-expression • 2.4k views

ADD COMMENT • link updated 7.2 years ago by Kevin Blighe 88k • written 7.4 years ago by sandrine.muller.research ▴ 30

0

Entering edit mode

Hi! Today I was reading on co-expression networks, but it really depends on how your experiment is and what tools you have at your disposal. For example, seems like WGCNA works really good, however, seems you need quite the number of samples to run a significant analysis. On using differentially expressed genes, here is what is wrote on their FAQ:

WGCNA is designed to be an unsupervised analysis method that clusters genes based on their expression profiles. Filtering genes by differential expression will lead to a set of correlated genes that will essentially form a single (or a few highly correlated) modules. It also completely invalidates the scale-free topology assumption, so choosing soft thresholding power by scale-free topology fit will fail.

I do not know if this is the case for all the tools, but it is definitely something to keep in mind. Cheers :)

ADD REPLY • link 7.4 years ago by biofalconch ★ 1.3k

0

Entering edit mode

Thank you @biofalconch for your answer! Indeed, I can understand their point. However, don't you think that you may have a lot of correlations that happen "by chance" if you are not controlling for random noise (from a baseline) ? I guess when you correlate the modules with a disease for instance, a lot of the genes in the module can be false positives... or am I having an inadequate reasonning?

ADD REPLY • link 7.4 years ago by sandrine.muller.research ▴ 30

2

Entering edit mode

Yes! It may be bad to leave the whole dataset, and the first part of the same question of the FAQ adresses this (probably shouldn't have left it out). But here it is

Probesets or genes may be filtered by mean expression or variance (or their robust analogs such as median and median absolute deviation, MAD) since low-expressed or non-varying genes usually represent noise. Whether it is better to filter by mean expression or variance is a matter of debate; both have advantages and disadvantages, but more importantly, they tend to filter out similar sets of genes since mean and variance are usually related.

So what I got from this is "filter at your own risk"

ADD REPLY • link 7.4 years ago by biofalconch ★ 1.3k

0

Entering edit mode

WGCNA is indeed fundamentally based on correlation - that's how it initially identifies modules. Once identified, it then transforms the module by single value decomposition (i.e. PCA) in order to derive the loadings for each gene to each module. WGCNA is really great in certain situations.

ADD REPLY • link 6.0 years ago by Kevin Blighe 88k

score 3 · Answer 1 · 2017-09-20

Just to throw another couple of ideas out there.

With simple correlation analyses, like generating a huge correlation matrix for all of your variables, you can also derive a P value from the correlation function (in R at least) in order to back up whatever values you obtain. Through this, you can also plot the values and identify 'structure' in your dataset, as you can see from my first figure below.

Another thing that I've recently been researching this past year has been graph theory, minimal spanning trees, and the identification of 'communities' in these. There are functions in R for this in the packages igraph and plotrix. he data is the same as per the correlation matrix. In the correlation plot below, for example, you can see 'blocks' of highly positively and negatively (inversed) correlated samples - these are akin to modules and communities in a network analysis.