I am working on WGCNA for DNA methylation EPIC array data. To do co-methylation module detection, I selected the top 400,000 CpG probes with high variance across 71 samples. But, I always get a big module which will lead to the cluster dendrogram looking so weird... I don't know what happened to my analysis. I'm new to this kind of analysis... Any suggestion would be appreciated.
Code:
bwnet = blockwiseModules(datMethy,corType="pearson",maxBlockSize=12000, deepSplit=2,
networkType="signed", power=12, minModuleSize=30, reassignThreshold=0, mergeCutHeight=0.25,numericLabels=TRUE, saveTOMs=TRUE, pamRespectsDendro=FALSE, saveTOMFileBase="methyTOM", verbose = 3)
What does a basic PCA of these probes look like (both the 71 samples and the 400k probes)? I have seen this numerous times; and sometimes it can be explained by strong and widespread differential methylation between conditions (i.e., treatment with a DNMT inhibitor, or a batch effect); and sometimes it's inexplicable.
In the former case (or in any case where PCA reveals groups) you can either (1) compute TOMs within group and make a consensus or (2) compute networks within groups and compare. (1) is for when the grouping is coincidental or not informative for your questions; and (2) is for when the sample grouping is of biological interest.
In the "inexplicable" cases I have had success "converting" the above kind of dendrogram via so-called "robust" WGCNA, wherein you would draw 10-20 bootstraps of your 71 samples (generating 10-20 TOMs), and then call the
consensusTOM
function to merge the results together.Hi LChart, Thank you very much for your helpful advice and I'm sorry for the late reply. From the PCA plot, it is not obvious that the samples from the two conditions have obvious grouping. But the cumulative proportion of variance explained by my top 2 components is only 25.9% (5.44%+20.46%)... Under this condition, should I use the second advice (robust WGCNA), right? Thanks,
Can you show the other side of the singular decomposition (i.e., the gene loadings)?
The loadings will be a
(num_probe, num_components)
matrix, and can be obtained either by runningsvd
on the scaled methylation data and taking the left components, or by multiplying the scaled methylation data with your(num_sample, num_components)
("right singular vectors") matrix.