for go enrichment analysis which tool do you recommend?
5
0
Entering edit mode
4.2 years ago
eridanus ▴ 40

Hello. I would like to ask if you believe that DAVID is ok for GO enrichment analysis or should I pick another web based or R tool? Thank you in advance.

RNA-Seq go • 8.3k views
ADD COMMENT
5
Entering edit mode
4.2 years ago

You have to choose what kind of analysis you want to do :

1- Over Representation Analysis: (ORA )It is a widely used method to know whether genes in biological pathways are over-represented (enriched) in your list. Your list may come from differential expression analysis. There are a lot of web-tools and also R package you choose for this kind of analysis. As far as I know DAVID, the last update was in 2016. I am still a fan of this tool, but at the same time, other tools like what has been mentioned above should work fine.

2- Gene Set Enrichment Analysis: (GSEA) , it was developed by Broad Institute. This is the preferred method when genes are coming from an expression experiment like microarray and RNA-seq. However, the original methodology was designed to work on microarray but later modification made it suitable for RNA-seq also. In this approach, you need to rank your genes based on a statistic (like what DESeq2 provide), and then perform enrichment analysis against different pathways (= gene set). You have download the gene set file. The point is that here the algorithm will use all genes you have in the ranked list for enrichment analysis [in contrast to ORA where only genes passed a specific threshold (like DE ones) would be used for enrichment analysis]. You can find more details about the methodology on the original PNAS paper, here is a summary of why one should use this approach instead of ORA:

1- After correcting for multiple hypotheses testing, no individual gene may meet the threshold for statistical significance.

2- On the other hand, one may be left with a long list of statistically significant genes without any unifying biological theme.

3- Cellular processes often affect sets of genes acting in concert, using ORA may lead to miss important effects on pathways.

GSEA software maybe finds on its homepage. However, there are some Bioconductor packages which use a similar approach to do GSEA, I like to use this one : fgsea

ADD COMMENT
1
Entering edit mode
4.2 years ago

One of my first choices is AgriGO, I like the presentation of results. The downsides are that it is a web-based tool that can go (and has gone) offline at any time for days or weeks.

http://systemsbiology.cau.edu.cn/agriGOv2/

Plus there is a lack of clarity about the implementation of the algorithms.

ADD COMMENT
1
Entering edit mode

Here is an output by AgriGO, I find this output very informative and is the main reason I recommend the service

enter image description here

ADD REPLY
0
Entering edit mode

Thank you a lot for your reply, I ll check it! So DAVID, do you think it is outdated and not as useful anymore?

ADD REPLY
0
Entering edit mode

hmm the truth is I am interested in Homo Sapiens, so maybe AgriGO is not the perfect tool?

ADD REPLY
1
Entering edit mode

does it not have human annotations? They have started out with plant (and agricultural) focus but I believe they have added many species now. Go to species and select human.

As for DAVID, that tool is also widely used, it has a lot of critics but supporters as well. The interface is counterintuitive, and the data is not ever corrected for multiple comparisons, thus you can find just about anything, but then it is also more subjective.

ADD REPLY
1
Entering edit mode
4.2 years ago
GenoMax 147k

You can also try AmiGO.

ADD COMMENT
1
Entering edit mode
4.2 years ago

I personally prefer g:profiler, as it allows you to set your own background. It also has an R package.

ADD COMMENT
0
Entering edit mode

Seconding this. The background option is why I prefer it over most alternatives. Goseq could be an option as well.

ADD REPLY
0
Entering edit mode

What do you mean exactly with setting the background? Thank you!

ADD REPLY
1
Entering edit mode

I have an explanation of it here.

ADD REPLY
0
Entering edit mode

hello again. I just wanted to ask about gprofiler parameters. in my deg table there are a lot of genes that are not annotated, should I select all known genes in Statistical domain scope? Thank you a lot!

ADD REPLY
1
Entering edit mode

I would use a custom background containing all the genes expressed in your samples.

ADD REPLY
0
Entering edit mode

You might look at whatismygene.com. Off the radar enrichment site with serious attention to backgrounds, an accompanying blog, and a huge database.

ADD REPLY
1
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 1630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6