Hi,
Say I've carried out two differential expression analyses using the limma-voom pipeline on two datasets, A and B.
How could one go about comparing the results from these two analyses to see how similar or different they are?
For example, there may be genes that are differentially expressed in set A but not in set B, and vice versa.
Also, there may be genes that are differentially expressed in both set A and set B, but are going in opposite directions.
There also may be genes that are behaving similarly in set A and in set B.
This is quite a broad question but I'm wondering if anyone has ideas about how to investigate this. In my particular case I'd like to see how well the results from set A replicate in set B. Is the best way just to count the numbers of genes that fall into the above categories?
Thank you
Comparing the actual gene sets is statistically VERY problematic, have you considered just performing the analysis as if it's a single (multifactorial) dataset? There may be a batch effect, but that's statistically more robust.
I guess you would need to show that the overall expression changes are similar between these datasets. How about using something like Gene Set Enrichment Analysis to try and show that the differentially-expressed genes lead to enrichment for certain gene sets / pathways. Comparing datasets 1:1 is always problematic because things like different library preparation kits can lead to different results regardless if the underlying biology. Therefore I always find it helpful if the biological message is the same in both datasets. Alternatively you could cluster both datasets and then assign KEGG (or similar) pathways to different clusters, and then see if this is biologically-comparable.