Someone could explain me why in WGCNA if I have 17.000 genes with p-value <0.05, people just choose a topk of genes? let says, only the top 5000 genes? how can I filter the information? could I use the whole list of 17 thousand genes as input to WGCNA?
Thanks Dr. Warner for your answer, however, I have an extra question, why choose 8000 insted 2000 genes for example? How can I argument my selection of top genes?
Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal.
Adding an answer should only be used for providing a solution to the question asked.
I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
If I'm understanding this correctly this would filter based on the results from your ANOVA. This isn't really recommended since it will give you modules that basically correlate to your factors rather than on the co-expression network. Please refer to point 2:
why in WGCNA if I have 17.000 genes with p-value <0.05, people just
choose a topk of genes?
This is sometimes done to remove lowly expressed genes, genes with low variance or any other potential indicator of noise. These genes won't have a strong impact on the network.
how can I filter the information?
This can be done by filtering for mean expression, variance or ranked connectivity.
could I use the whole list of 17 thousand genes as input to WGCNA?
Yes. The genes which would be filtered above most likely won't be assigned to a module when you use the full dataset or they will have low membership in multiple modules.
Thanks Dr. Warner for your answer, however, I have an extra question, why choose 8000 insted 2000 genes for example? How can I argument my selection of top genes?
Here Im attaching the piece of code
Thanks Adriana
Please use
ADD COMMENT
orADD REPLY
to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:
If I'm understanding this correctly this would filter based on the results from your ANOVA. This isn't really recommended since it will give you modules that basically correlate to your factors rather than on the co-expression network. Please refer to point 2:
https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html
To answer your question though, there is no rule for choosing 8000 genes over 2000 genes. This is a bit arbitrary.