Question

Many Differentially expressed genes but few GO terms

0

Entering edit mode

7.1 years ago

firestar ★ 1.6k

I detect a few hundred significantly (qval<0.05) DE genes. When I perform an enrichment analysis using these genes on GO network (using various tools), I do find around 50 or so non-significant GO terms. But, I find no significant GO terms (qval < 0.05). What might be the reasons for this? Anything that could be tweaked?

differential-gene-expression RNA-seq gene-ontology • 4.3k views

ADD COMMENT • link updated 7 months ago by Ram 44k • written 7.1 years ago by firestar ★ 1.6k

4

Entering edit mode

By "qval" I assume you mean FDR adjusted p-value. This may be one of the key reasons why your GO analysis is not returning anything significant. FDR is not the best method when looking at ontologies. Read on to see why...

Due to the True Path Rule, genes associated with a GO term are also associated with its parent terms (for more on this, see Chapter 22 of Dr. Draghici’s book [7]). This means that simply performing an enrichment analysis for each GO term will count each gene many times, which is a serious problem (see Draghici, Chapter 24). Furthermore, testing the enrichment of all GO terms is not necessary and due to the unavoidable multiple comparison curse will increase the number of false positives reported. Luckily, one can leverage the structure and additional properties of GO in order to limit the number of tests performed, and therefore the number of comparisons one must correct for. In 2006, Alexa [8] proposed two methods to accomplish this: “Elim” and “Weight.”

For example, in iPathwayGuide and iVariantGuide we offer both methods, each of which follow the same outline.

Decouple GO terms from one another
Perform significance tests
Correct for multiple comparisons

Elim

The Elim method assesses the significance of GO terms starting with the most specific terms first. The benefit of this approach is that it is easier to find specialized terms that are significant, e.g. "response to amphetamine" is more descriptive than "response to chemical.” This approach provides a very nice custom cut through the GO hierarchy that “magically” identifies the lowest level of abstraction that contains the significant GO terms in the given experiment.

Weight

Given a set of related GO terms, the Weight method is designed to identify the term that best represents the genes of interest, regardless of where the term falls in the hierarchy. This approach is less stringent than Elim, capturing more true positives with the drawback of including additional false positives.

References

Khatri, P., Sirota, M., & Butte, A. J. (2012). Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol, 8(2), e1002375.
Rhee, S. Y., Wood, V., Dolinski, K., & Draghici, S. (2008). Use and misuse of the gene ontology annotations. Nature Reviews Genetics, 9(7), 509-515.
Dunn, O. J. (1959). Confidence intervals for the means of dependent, normally distributed variables. Journal of the American Statistical Association,54(287), 613-621.
Dunn 1961 Dunn, O. J. (1961). Multiple comparisons among means. Journal of the American Statistical Association, 56(293), 52-64.
Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), 289-300.
Benjamini, Y. & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of statistics, 1165-1188.
Drăghici, S. (2011). Statistics and data analysis for microarrays using R and bioconductor. CRC Press. Available here.
Alexa, A., Rahnenführer, J., & Lengauer, T. (2006). Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics, 22(13), 1600-1607.

ADD REPLY • link updated 7 months ago by Ram 44k • written 7.1 years ago by andrew ▴ 560

0

Entering edit mode

Yes. by qvalue I mean, Benjamini-Hochberg adjusted p-value which I think is the same as FDR adjusted p-value. But, this is some new insight. Thanks for this.

ADD REPLY • link 7.1 years ago by firestar ★ 1.6k

0

Entering edit mode

Why do you have to find significantly enriched GO terms ? There are many reasons why no significant enrichment is found and that is a perfectly acceptable result. GO annotations incompletely capture current knowledge. If you're looking at something new, genes may not be well annotated with the corresponding terms and so you will most likely not see any enrichment because usual approaches favour gene sets with many annotations (see this paper). Also, your threshold of 0.05 is entirely arbitrary. What if you had terms with a q-value of 0.0509 ? The values you get also depend on the approach to enrichment analysis you take. Many of these make many unnecessary tests that reduce the detection power. For example, if your experiment is only concerned about cellular functions, you don't need to test for GO terms such as foraging behaviour i.e. in general, terms that are not below the cellular process term. In addition many methods make unnecessary test by not taking into account redundancies in the annotations (see this paper).

ADD REPLY • link 7.1 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Hmmm.. I am looking for significantly enriched or depleted GO terms because I don't want it to be by chance. If I take some random genes there is a chance that I am going to get some GO terms. But, is that reliable? I understand 0.05 is arbitrary, but my qvalues for the returned GO terms are 0.8, 0.9 etc, so even if I relaxed 0.05 to 0.06 or even 0.1, it wouldn't make any difference. Another reason is that I have two datasets (different tissues) and this issue is only with one of them. As you say there are different implementations of enrichment analyses, I have tried a few different ones (DAVID, ClusterProfiler,goana,ClueGO etc). Although the list of GO terms differ quite a bit, none of them are significant.

ADD REPLY • link 7.1 years ago by firestar ★ 1.6k

0

Entering edit mode

I don't know about others but, each time that I do gene enrichment analysis, I come back disappointed by the results that mostly never make sense and that provide for more confusion. Do any unbiased gene enrichment on a large chunk of genes and cancer and immune pathways always come back. One thing that equally worries me is that I have heard how some people even in clinical settings are forming conclusions based on in silico gene enrichment results.

If you cannot even get a significant enrichment term, then I would suggest doing a manual literature search and switch off autopilot for the remainder of your study. I don't mean to be critical or anything, but I have fundamental doubts about gene enrichment based on my own and others' experiences. I neither want to sound old (mid 30s), but I remember the days when we had to do literature searches and it was actually fun trying to piece together the jigsaw. I really worry about how technology is attempting to replace our creative brains.

ADD REPLY • link 7.1 years ago by Kevin Blighe 88k

1

Entering edit mode

Your reply reminds me of an email I got sometime back from the author of a popular GO enrichment tool. Quote:

"But it's rare that anybody interprets GO enrichment analysis as being actionable information. They tend to run it, mention it in their paper as justifying the results as being "reasonable", and then pretty much ignore it when they plan their next experiments. It beats staring at the gene list, but it's actually not all that useful since it depends heavily on what GO has decided to annotate - which is not static."

ADD REPLY • link 7.1 years ago by firestar ★ 1.6k

0

Entering edit mode

Could be true what s/he says!

ADD REPLY • link 7.1 years ago by Kevin Blighe 88k

0

Entering edit mode

Need some more details. Choice of tool matters depending on the type of organism you are looking for.

1) What are the tools you used. 2) Parameters used in individual tool.

ADD REPLY • link 7.1 years ago by EagleEye 7.6k