Hi everyone, i wanted to create an expression matrix for WGCNA input. however, i has been said that use RPKM/FPKM data instead of CPM, how can i change my TCGA data to RPKM/FPKM in GDCquery and how to filter expression set of genes by FDR to less than 5000, which is ideal for WGCNA as i have 17000 genes in expression set, but i can not add p-value without losing expression set.
Dear Kevin thank you for your helpful comment, I wanted to use DEGs identified from EdgeR/limma-voom normalized by TMM and extract their expression data from norm_counts (below) for WGCNA, which is mentioned in several good papers (e.g. PMC6660050), however in the part 2 of the FAQ it has been said that "We do not recommend filtering genes by differential expression", is this wrong? and can i use "norm" from EdgeR following code which is logged CPM, while it has 20000 genes? isn't this too long for WGCNA? can i just do another filter based on Adjusted p value? because it is said that top 5000 is good for WGCNA
thank you so much, I'm confused with these.
Hi, well, what do you hope to achieve by doing WGCNA? There is a stark difference between running WGCNA on DEGs compared to running WGCNA on all genes. It depends on what you are hoping to achieve by using WGCNA.
actually, I wanted to get significant modules related to TNM stages (I-IV) and then use these genes for venn and down stream analysis, in this regard, are 20000 genes good or i have to get top 5000 based on ANOVA? and i think those papers who are using DEGs for WGCNA have done wrong analysis as not recommended by WGCNA authors, what do you think?
I see, then you should do WGCNA unfiltered, and then correlate the module eigengenes to TNM stage. If, for example, the green module statistically significantly correlates to TNM stage, then you would explore those genes comprising the green module.
Dear kevin, Thank you kindly for your valuable help, may i ask your opinion on this recent post related to WGCNA too?
WGCNA for diferent stages (I-IV)
Thank you so much
Dear kevin, after WGCNA analysis (step by step method) for 17000 genes i got this dendrogram, which had10035 genes in module 0 (grey). do i have to reduce my genes before WGCNA by another filtering method? i think this is not a good figure for paper, isn't this?
Hi, it is more important what the data means after you do the module-trait correlations / relationships. However, the fact that ~10000 genes are assigned to grey tells me that you should do more rigorous filtering of your input data.
Dear kevin, which method is better to do such filtering? is my input data file okay? my data.csv is generated from edgeR/limma method after TMM normalization and applying voom for log transformation.
thank you for your valuable comments.
I would filter at the raw count stage. For example, filter out genes with mean raw count < 20