fail to pick soft threshold in WGCNA
2
0
Entering edit mode
7.0 years ago
RC ▴ 20

Hi thereļ¼

I'm trying to find the co-methylation sites using WGCNA. Methylation status were assessed using the Illumina EPIC microarray. When I run WGCNA's pickSoftThreshold() on 810135 sites (after some common preprocessing steps, such as filter out probes with bad detection P value), server killed this process and reported "task 1 failed - "cannot allocate vector of size 30.2Gb" ".

the code for picksoftthreshold is

sft1=pickSoftThreshold(resids2,blockSize=5000,powerVector =c(seq(4,10,by=1),seq(10,20,by=2)),networkType="signed",verbose=5)

memory of my server: 220GiB

i have tested what if i use a smaller block size, such as 2000, in this step. It result in 12min for each block, so maybe 4800min(more than 3 days ) for the whole data.As this step takes too much time and eats too much memory, how do you guys deal with it? maybe some basic filters? or can i pick the soft threshold on data subsets? one study noted that they get the power from the calculation by the scale-free topology criterion on data subsets.However,they didn't give any details about how they select the subsets(title:Mosaic Epigenetic Dysregulation of Ectodermal Cells in Autism Spectrum Disorder).

Any suggestion?

wgcna • 2.3k views
ADD COMMENT
0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode
7.0 years ago

on 810135 sites

That's quite a lot. I have used WGCNA for RNA-seq analysis, which is with far less "sites". It doesn't surprise me that it takes such a long time for a dataset like that. I usually filter out sites with a low variance.

ADD COMMENT
0
Entering edit mode
7.0 years ago

In my (limited) experience, the time taken scales roughly with the size of the overall dataset, both genes/CpGs and samples. So, as you mention it can be helpful to remove low-variance probes as well as perform at least the threshold selection step in a subset of the total samples. When selecting the subset, I would probably attempt to preserve some level of balance across any heterogeneity in the samples, e.g. case/control, sex, etc.

ADD COMMENT

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6