How to extract genes based on a list of GO terms or their children terms?
0
0
Entering edit mode
4.1 years ago

In an R session, I have a data frame with all the Escherichia coli genes and their associated GO terms. Each gene is annotated with one GO term only, representing the deepest annotation level. I then have a character vector of specific GO terms that our collaborators are interested in for their work.

I would like to extract all the genes from the first data frame that are associated with the GO terms in the character vector.

When I say "associated" I mean either carrying a GO term that is found in the vector, or a children of that term. An example: one of the GO terms in the vector is "cell death", but a gene is likely to be annotated with something much more specific, that is a child term of "cell death".

I have GO.db installed but I'm not at all proof with it, since it's the first time I do this. How do I properly carry on this task?

Currently, my strategy would be the following:

  1. With each GO term in the character vector, extract all its children terms using the GO.db package.
  2. unlist() the results into a single character vector containing all initial GO terms and their children.
  3. Extract all genes from the data frame whose associated GO term matches any of the found GO terms / children GO terms.

Would this be the most strategic approach? They are ~ 30 GO terms, and for each I have to extract all its children terms. Sounds like it's gonna be a huge output list.

GOterms Gene Ontology Children Match • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 2029 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6