I have a dataset of single-cell expression data (at the moment working on CD4 cells only) from 4 patients.
Would 4 patients be enough to get any significant results, considering that my sample number is essentially 1200 cells?
I have filtered my dataset for low counts, so I have ended up with ~850 genes, and WGCNA runs quite smoothly but the module-trait correlations that I see are quite weak.
I was wondering if that is because I am working with so few genes or because all those cells come from only 4 patients.
Could be a few reasons. So, you have 850 genes x ~1200 cells? I'm still not sure that WGCNA is best for scRNA-seq data, and I believe running WGCNA on PC eigenvectors would be better (as I explain in my answer, below). The cellular heterogeneity that comes with scRNA-seq datasets may be what is 'beating' WGCNA in this case, and also the fact that you are effectively dealing with 4 batches (4 samples), or have you run it on the 'integrated' dataset after adjustment for batch?
You are literally the first person that I have ever heard of using WGCNA on scRNA-seq data.
To run WGCNA on such a dataset, you will require a lot of RAM, assuming that you want to run it over the entire transcriptome of each cell. Moreover, I question what exactly it would mean when compared to the output of other methods such as tSNE, UMAP, psuedo-time analysis, etc.
None of us can stop you going ahead with this, but I just question what exactly it would mean. The aforementioned data reduction methods were designed specifically to reduce the computational burden of processing and interpreting scRNA-seq data. Thus, it may make more sense to run WGCNA on a certain number of principal components that account for an appreciable amount of explained variation, like > 80%.
Actually, the computational expense is not that high, especially if the adjacency matrix is filtered to remove genes with a low variability and/or expression level.
The additional information would be to see the "wiring" of the gene expression network,in different clusters, and identification of potential key driver genes.
Hi Kevin,
I have filtered my dataset for low counts, so I have ended up with ~850 genes, and WGCNA runs quite smoothly but the module-trait correlations that I see are quite weak. I was wondering if that is because I am working with so few genes or because all those cells come from only 4 patients.
Penny
Could be a few reasons. So, you have 850 genes x ~1200 cells? I'm still not sure that WGCNA is best for scRNA-seq data, and I believe running WGCNA on PC eigenvectors would be better (as I explain in my answer, below). The cellular heterogeneity that comes with scRNA-seq datasets may be what is 'beating' WGCNA in this case, and also the fact that you are effectively dealing with 4 batches (4 samples), or have you run it on the 'integrated' dataset after adjustment for batch?
You are literally the first person that I have ever heard of using WGCNA on scRNA-seq data.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized. This comment belongs under Kevin's answer.SUBMIT ANSWER
is for new answers to original question.It's enough to me. At some point, each cell can be used as a sample