what kind of data can be directly WGCNA?
1
0
Entering edit mode
7.0 years ago
a511512345 ▴ 190

hello, I want to use WGCNA for RNA-seq data in TCGA, but I do not know which form of data to download. Is PFKM or FPKM-UQ? Or what kind of data can be directly WGCNA? How do I get them? Is there any website that can be downloaded directly? Looking forward to your answer thank you very much

WGCNA form of data • 4.5k views
ADD COMMENT
1
Entering edit mode
7.0 years ago

Dear friend, I presume that you mean FPKM, not PFKM?

Firstly, it is stated in the FAQ (frequently asked questions) written by the author of WGCNA that any type of nomaised RNA-seq data can be used for WGCNA:

4. Can WGCNA be used to analyze RNA-Seq data?

Yes. As far as WGCNA is concerned, working with (properly normalized) RNA-seq data isn't really any different from working with (properly normalized) microarray data.

We suggest removing features whose counts are consistently low (for example, removing all features that have a count of less than say 10 in more than 90% of the samples) because such low-expressed features tend to reflect noise and correlations based on counts that are mostly zero aren't really meaningful. The actual thresholds should be based on experimental design, sequencing depth and sample counts.

We then recommend a variance-stabilizing transformation. For example, package DESeq2 implements the function varianceStabilizingTransformation which we have found useful, but one could also start with normalized counts (or RPKM/FPKM data) and log-transform them using log2(x+1). For highly expressed features, the differences between full variance stabilization and a simple log transformation are small.

Whether one uses RPKM, FPKM, or simply normalized counts doesn't make a whole lot of difference for WGCNA analysis as long as all samples were processed the same way. These normalization methods make a big difference if one wants to compare expression of gene A to expression of gene B; but WGCNA calculates correlations for which gene-wise scaling factors make no difference. (Sample-wise scaling factors of course do, so samples do need to be normalized.)

[source: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html]

If you want to download expression in the form of Z-scores (also suitable for WGCNA) for just a bunch of genes of interest, then you can use cBioPortal by my colleagues at MSKCC. The R implementation of this is CGDSR.

Best of luck, Kevin

ADD COMMENT
0
Entering edit mode

Dear Dr Blighe, I also have this question about microarray data. Now, I used a microarray dataset with GCRMA normalization algorithm. should I also use log2(x+1) before using microarray Exprdata as input for WGCNA? I have to say that the maximum and minimum of my Exprdata after log2(x+1) transfer are 16 and 0 respectively. I appreciate if you share your comment with me.

ADD REPLY
0
Entering edit mode

When you run gcrma(), the output should be the data that you will then use for WGCNA. It should be log2. You do not have to increment (+1) the data.

ADD REPLY
0
Entering edit mode

Sorry. I am a bit confused. based on your experience should I use log2() for gcrma() output before using in WGCNA or not?

ADD REPLY
0
Entering edit mode

Hello, the process goes like this:

  1. Raw data
  2. gcrma
  3. WGCNA

Log2 expression levels are produced after step #2

ADD REPLY

Login before adding your answer.

Traffic: 1978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6