Dear all,
I have whole transcriptome RNA-Seq data for 20 paired samples (10 controls and 10 stimulated with an agent). I have ~2,500 differentially expressed genes between the controls and stimulated samples, and I would like to identify some smaller modules of genes acting together within this big number of genes. Ideally those would be then matched with a common upstream transcription factor, or other regulator. I would like to perform a co-expression analysis on my differentially expressed genes, but I am not sure what is the best way to do this with paired samples.
- What is the best input data for calculating correlations and co-expression? I have considered: the read counts mapped to my genes (that would not take into account pairing between samples), the difference in read counts between my stimulated sample and its control (one value per sample pair) or log2 fold change between the pair. Maybe there is an even better solution?
- Which correlation metric would be better to use with this small sample size? Pearson or Spearman?
- When clustering my genes - is it better to consider genes correlated in one direction only, or both (positively and negatively correlated)?
Any help would be greatly appreciated!
Yes, WGCNA is definitely on my list of things to try with this data. I didn't see if it can somehow deal with paired samples?
My question is more fundamental than which software to use. I would like to know: how to best utilize the power of paired design for co-expression analysis? And what should I be correlating here - the counts or some measure of change of expression in a pair?
You should make this a comment rather than a new answer see the [ADD COMMENT] box below any existing comment or reply.
Sorry, will do that next time.
I agree with Ashutosh's suggestion of WGCNA, you can get normalized counts (CPM) from EdgeR and DESeq(2), that you can use for WGCNA.
You could also do hierarchical clustering to see which genes cluster together. It will handle positive/negative.