Firstly, it is stated in the FAQ (frequently asked questions) written by the author of WGCNA that any type of nomaised RNA-seq data can be used for WGCNA:
4. Can WGCNA be used to analyze RNA-Seq data?
Yes. As far as WGCNA is
concerned, working with (properly normalized) RNA-seq data isn't
really any different from working with (properly normalized)
microarray data.
We suggest removing features whose counts are consistently low (for
example, removing all features that have a count of less than say 10
in more than 90% of the samples) because such low-expressed features
tend to reflect noise and correlations based on counts that are mostly
zero aren't really meaningful. The actual thresholds should be based
on experimental design, sequencing depth and sample counts.
We then recommend a variance-stabilizing transformation. For example,
package DESeq2 implements the function
varianceStabilizingTransformation which we have found useful, but one
could also start with normalized counts (or RPKM/FPKM data) and
log-transform them using log2(x+1). For highly expressed features, the
differences between full variance stabilization and a simple log
transformation are small.
Whether one uses RPKM, FPKM, or simply normalized counts doesn't make
a whole lot of difference for WGCNA analysis as long as all samples
were processed the same way. These normalization methods make a big
difference if one wants to compare expression of gene A to expression
of gene B; but WGCNA calculates correlations for which gene-wise
scaling factors make no difference. (Sample-wise scaling factors of
course do, so samples do need to be normalized.)
If you want to download expression in the form of Z-scores (also suitable for WGCNA) for just a bunch of genes of interest, then you can use cBioPortal by my colleagues at MSKCC. The R implementation of this is CGDSR.
Dear Dr Blighe, I also have this question about microarray data. Now, I used a microarray dataset with GCRMA normalization algorithm. should I also use log2(x+1) before using microarray Exprdata as input for WGCNA? I have to say that the maximum and minimum of my Exprdata after log2(x+1) transfer are 16 and 0 respectively. I appreciate if you share your comment with me.
When you run
gcrma()
, the output should be the data that you will then use for WGCNA. It should be log2. You do not have to increment (+1) the data.Sorry. I am a bit confused. based on your experience should I use log2() for gcrma() output before using in WGCNA or not?
Hello, the process goes like this:
Log2 expression levels are produced after step #2