Duplicated genes in overrepresentation analysis (ORA)
1
0
Entering edit mode
3.9 years ago
nhaus ▴ 420

Hello,

I have different gene lists from different biopsies. In each gene list, all genes are listed which carry a mutation (identified by WES). I want to run a overrepresentation analysis, to check if certain pathways are more hit by mutations than others. However, for many gene lists there is the same gene listed multiple times, because there happened to be more than one mutation in that gene. By default the ORA tool I am using (webgestalt) removes duplicated genes, however, I think in my case it might be useful to keep them.

I am very new to ORA/GSEA, so I am not sure if this makes sense. Furthermore, I did not find a tool yet, which allows me to keep duplicated genes. If somebody could tell me if my idea makes sense and also a tool or a way how I can analyze the gene lists, I would be very happy!

Cheers!

ORA GSEA • 1.6k views
ADD COMMENT
0
Entering edit mode
3.9 years ago
kelen ▴ 210

If I understand you correctly you want an ORA tool that would accept a list of genes that has duplicated genes? In that case it might be easier to just input a gene list that only has unique values and not guess if the tool does any filtering. You can filter your gene list to only report back unique values in either bash, R, python, even excel.

ADD COMMENT
0
Entering edit mode

I think you missunderstood me. I know that the tool I am using removes duplicated genes. However, I am not sure if this is really the best practice for my particular type of analysis. Specifically I would like a tool that includes duplicates in its analysis.

ADD REPLY
0
Entering edit mode

This is just coming from what I know about ORA as I thought this is what you were asking, if you are instead looking for something WES and mutation specific then there might be other approaches. For best practices in checking for enrichment through ORA in a simple gene list would require using unique genes in the input (only including the duplicated genes once). As far as I know there is no tool in these types of EAs that would benefit from 'including' duplicates (doesn't really make sense either), which is why the duplicates get removed or it gives you a warning. That said, if you think the number of mutations (leading to the duplication of gene names in your input) is important and biologically meaningful, you could try ranking your gene list by the number of mutations and use tools like gprofiler2 that can test a simple ranked gene list.

ADD REPLY
0
Entering edit mode

Aright, thank you very much! I plan on comparing biopsy A vs biopsy A+biopsy B to figure out if some GO terms/pathways are significantly more mutated in biopsy A. Do you think this comparison, i.e. the choice of gene list and background makes sense?

ADD REPLY

Login before adding your answer.

Traffic: 2659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6