WGCNA data input
1
0
Entering edit mode
7.6 years ago

Someone could explain me why in WGCNA if I have 17.000 genes with p-value <0.05, people just choose a topk of genes? let says, only the top 5000 genes? how can I filter the information? could I use the whole list of 17 thousand genes as input to WGCNA?

Thanks

Adriana

RNA-Seq • 3.1k views
ADD COMMENT
0
Entering edit mode

Thanks Dr. Warner for your answer, however, I have an extra question, why choose 8000 insted 2000 genes for example? How can I argument my selection of top genes?

Here Im attaching the piece of code

#====================================================================================================================================
# select samples
sample.id <- c(6:9,14:17,22:25,30:33) 

# select genes
expr0 <- dat2log[,sample.id]           
temp.anova <- function(x,fa){            
  fit <- lm(x~factor(rep(1:4,each=4)))  
  return(anova(fit)$`Pr(>F)`[1])        
}
pvalues <- apply(expr0,1,temp.anova)    
cutoff <- 0.05/length(pvalues)       
length(pvalues)


#=======================================================================================================================================

# select top k genes for WGCNA
topk = 8000                     
gene.id <- which(rank(pvalues)<8000)
lengthgene.id)

##
expr <- t(dat2log[gene.id, sample.id])

Thanks Adriana

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT or ADD REPLY to answer to previous reactions, as such this thread remains logically structured and easy to follow. I have now moved your post but as you can see it's not optimal. Adding an answer should only be used for providing a solution to the question asked.

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLY
0
Entering edit mode

If I'm understanding this correctly this would filter based on the results from your ANOVA. This isn't really recommended since it will give you modules that basically correlate to your factors rather than on the co-expression network. Please refer to point 2:

https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html

To answer your question though, there is no rule for choosing 8000 genes over 2000 genes. This is a bit arbitrary.

ADD REPLY
1
Entering edit mode
7.6 years ago
Jake Warner ▴ 840

why in WGCNA if I have 17.000 genes with p-value <0.05, people just choose a topk of genes?

This is sometimes done to remove lowly expressed genes, genes with low variance or any other potential indicator of noise. These genes won't have a strong impact on the network.

how can I filter the information?

This can be done by filtering for mean expression, variance or ranked connectivity.

could I use the whole list of 17 thousand genes as input to WGCNA?

Yes. The genes which would be filtered above most likely won't be assigned to a module when you use the full dataset or they will have low membership in multiple modules.

ADD COMMENT

Login before adding your answer.

Traffic: 1985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6