I am looking for a method of finding over-represented pathways across multiple lists of differentially expressed genes. The simple scenario is that I have RNA-seq expression data for cell lines with derived resistance to a drug. For each resistance model (cell line) I can compare between the resistant line and the parental to get a list of differentially expressed genes (DEG) in resistant vs parental. And, I can identify over-represented pathways in that list of genes (e.g., using DAVID).
Now, consider that I have six different cell lines with derived resistance and each has its own list of differentially expressed genes (when compared to each parental). I can perform pathway analysis on each DEG list and then see which pathways come up as significant in multiple lists.
My question is, how do I get at those pathways which are maybe not significantly enriched in any one list but which are consistently represented by one or more genes in all (or many) of the different lists. I thought I would start by just mapping each gene in each DEG list to all its associated pathways (e.g., using KEGG) and then look for pathways that come up in all models. But, how do I define which are significant? I think I would want a test that considers both the amount of pathway over-representation within each list and across lists. Has anyone seen a method for that?
Sample data (completely fictional): How do I identify pathways (e.g., DNARepair) which are enriched across multiple lists (but not necessarily within lists)?
GeneListA
TP53 DNARepair
BIRC5 Apoptosis
MYC Transcription factor
...
GeneListB
EZH2 Chromatin
BRCA1 DNARepair
...
GeneListC
RB1 DNARepair
etc...
Thanks Alex. It seems like WGCNA is very much on topic but does not do exactly what I am looking for. The random permutation test is a good suggestion and may be the way to go. Given 3 lists of genes and their corresponding pathway annotations I can define some measure of over-representation of any pathway both within and across the 3 lists. Then with randomly generated gene lists (from the same background list) of same sizes, I can see how often that measure of over-representation is observed by chance.