Hello,
I am currently using the hdWCGNA package on single cell data. So I get modules on a scRNAseq of cancer cells that I project on healthy cells in order to see what are the modules that are "absent" from the healthy cells and so "exclusive" to cancer cells. I read that when doing this kind of manipulation, it is necessary to check the "Module preservation and reproducibility" so I followed the following tutorial: https://smorabit.github.io/hdWGCNA/articles/module_preservation.html
The output results are Z-statistics and MedianRank like the ones in the tutorial. I read the paper Is My Network Module Preserved and Reproducible? in which Zsummary and MedianRank are mentioned but not the different extensions Zsummary.qual
, Zsummary.pres
, MedianRank.qual
and MedianRank.pres
. While searching on internet, I found a little on the following tutorial: https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/ModulePreservation/Tutorials/cholesterolPathway.pdf I thought I understood that the .qual
comes from quality and .pres
from preservation.
But my questions are the following:
- What are the differences between
Zsummary.qual
andZsummary.pres
? - How to interpret these two values?
- Is it the same for the analysis of
MedianRank.qual
andMedianRank.pres
?
Thank you in advance for the future answers
Hello andres.firrincieli,
Thank you for your answer.
So if I understand correctly, unlike the
Zsummary.pres
which measures the preservation of modules in the other dataset, theZsummary.qual
is the quality of the different modules themselves without taking into consideration the projection of the modules on another dataset? If I'm right, then these two statistics have nothing to do with each other, right? If I'm right, then these two statistics have nothing to do with each other and do not analyze the same thing at all, right?Pretty much that. Long story short, .qual (quality) stats tells about the reproducibility of your modules in the reference dataset, while the .pres (preservation) stats tells about the reproducibility of the reference modules in the test dataset.
Not really. In a certain way .qual and .pres stats are correlated; typically, very small modules tend to be less preserved but also have very low quality. Therefore, always check the .qual stats before drawing some conclusion about the preservation of a reference module in the test dataset
Thank you for this clarified answer!
Hello andres.firrincieli
Sorry to reopen the discussion but a small doubt persists.
From your answer, I understand that:
If Z-summary.qual is not satisfactory then in this case we consider that the module itself is not really relevant without even taking into account the projection and therefore this module should be thrown in the trash. Isn't it?
And in the case where the Z.summary.qual is satisfactory (>10) but the Z.summary.pres is bad then should we consider that the module is relevant in its reference dataset but that the projection of the latter on the other dataset is not relevant? In this case where the Z.summary.qual is ok but not the Z.summary.pres, are you agree that we can still continue to work with the module itself without taking into account the projection?
That is correct
That is correct
I agree.
Thank you a lot!
I'm wondering why I'm getting two different Zsummary.qual results, even though I used the same reference network. I thought Zsummary.qual is independent of the test network and should always yield the same results due to 'internal quality testing'. Am I missing something? To clarify, I created WGCNA networks for a control group (a), intervention group (b), and intervention group (c). I then performed preservation analysis between (b) and (a), using (a) as the reference, and again between (c) and (a), also using (a) as the reference. I expected the Zsummary.qual values to be the same, but they are not at all. This raises several important questions:
How do we determine if a module is valid when we get different Zsummary.qual values for the same reference module? For example, if module M1 shows Zsummary.qual = 40 when tested against network (b) but only 5 when tested against network (c), should we consider M1 a "real" module? If we're looking for modules that are least preserved between conditions (which could be biologically interesting), how can we be confident in our reference modules when Zsummary.qual varies between comparisons? What's the purpose of Zsummary.qual if it changes with each test network? The common threshold of Zsummary.qual > 10 seems problematic when the value isn't stable. Should we use additional network-internal quality metrics to validate our reference modules before looking at preservation? If so, which ones?
Any insights would be appreciated. Thanks in advance!
Hi Nils,
The .qual stats should not change. I am not familiar with hdWGCNA which is for single cells expression data. So I have a couple of questions:
b vs a
andc vs a
?