Question

Duplicated genes in overrepresentation analysis (ORA)

0

Entering edit mode

3.8 years ago

nhaus ▴ 420

Hello,

I have different gene lists from different biopsies. In each gene list, all genes are listed which carry a mutation (identified by WES). I want to run a overrepresentation analysis, to check if certain pathways are more hit by mutations than others. However, for many gene lists there is the same gene listed multiple times, because there happened to be more than one mutation in that gene. By default the ORA tool I am using (webgestalt) removes duplicated genes, however, I think in my case it might be useful to keep them.

I am very new to ORA/GSEA, so I am not sure if this makes sense. Furthermore, I did not find a tool yet, which allows me to keep duplicated genes. If somebody could tell me if my idea makes sense and also a tool or a way how I can analyze the gene lists, I would be very happy!

Cheers!

ORA GSEA • 1.6k views

ADD COMMENT • link updated 3.8 years ago by kelen ▴ 210 • written 3.8 years ago by nhaus ▴ 420

score 0 · Answer 1 · 2021-01-20

0

Entering edit mode

3.8 years ago

kelen ▴ 210

If I understand you correctly you want an ORA tool that would accept a list of genes that has duplicated genes? In that case it might be easier to just input a gene list that only has unique values and not guess if the tool does any filtering. You can filter your gene list to only report back unique values in either bash, R, python, even excel.

ADD COMMENT • link 3.8 years ago by kelen ▴ 210

0

Entering edit mode

I think you missunderstood me. I know that the tool I am using removes duplicated genes. However, I am not sure if this is really the best practice for my particular type of analysis. Specifically I would like a tool that includes duplicates in its analysis.

ADD REPLY • link 3.8 years ago by nhaus ▴ 420

0

Entering edit mode

This is just coming from what I know about ORA as I thought this is what you were asking, if you are instead looking for something WES and mutation specific then there might be other approaches. For best practices in checking for enrichment through ORA in a simple gene list would require using unique genes in the input (only including the duplicated genes once). As far as I know there is no tool in these types of EAs that would benefit from 'including' duplicates (doesn't really make sense either), which is why the duplicates get removed or it gives you a warning. That said, if you think the number of mutations (leading to the duplication of gene names in your input) is important and biologically meaningful, you could try ranking your gene list by the number of mutations and use tools like gprofiler2 that can test a simple ranked gene list.

ADD REPLY • link 3.8 years ago by kelen ▴ 210

0

Entering edit mode

Aright, thank you very much! I plan on comparing biopsy A vs biopsy A+biopsy B to figure out if some GO terms/pathways are significantly more mutated in biopsy A. Do you think this comparison, i.e. the choice of gene list and background makes sense?

ADD REPLY • link 3.8 years ago by nhaus ▴ 420