Hello,
I have different gene lists from different biopsies. In each gene list, all genes are listed which carry a mutation (identified by WES). I want to run a overrepresentation analysis, to check if certain pathways are more hit by mutations than others. However, for many gene lists there is the same gene listed multiple times, because there happened to be more than one mutation in that gene. By default the ORA tool I am using (webgestalt) removes duplicated genes, however, I think in my case it might be useful to keep them.
I am very new to ORA/GSEA, so I am not sure if this makes sense. Furthermore, I did not find a tool yet, which allows me to keep duplicated genes. If somebody could tell me if my idea makes sense and also a tool or a way how I can analyze the gene lists, I would be very happy!
Cheers!
I think you missunderstood me. I know that the tool I am using removes duplicated genes. However, I am not sure if this is really the best practice for my particular type of analysis. Specifically I would like a tool that includes duplicates in its analysis.
This is just coming from what I know about ORA as I thought this is what you were asking, if you are instead looking for something WES and mutation specific then there might be other approaches. For best practices in checking for enrichment through ORA in a simple gene list would require using unique genes in the input (only including the duplicated genes once). As far as I know there is no tool in these types of EAs that would benefit from 'including' duplicates (doesn't really make sense either), which is why the duplicates get removed or it gives you a warning. That said, if you think the number of mutations (leading to the duplication of gene names in your input) is important and biologically meaningful, you could try ranking your gene list by the number of mutations and use tools like gprofiler2 that can test a simple ranked gene list.
Aright, thank you very much! I plan on comparing biopsy A vs biopsy A+biopsy B to figure out if some GO terms/pathways are significantly more mutated in biopsy A. Do you think this comparison, i.e. the choice of gene list and background makes sense?