Hi,
I'm looking for an easy way to retrieve all the genes in a list that are associated with a certain GO term, preferably using R/Bioconductor packages. I'm not interested in under/overrepresentation or enrichment.
For instance, say I have a list of 1000 genes and I want to create a sublist with only the genes known to be involved in 'heart development'.
Thanks!
Hi, I've been trying to use this - and it worked ages ago now I keep getting an error saying
any suggestions? thanks
Change "go_id" to "go".
You can find the valid filter names with
listFilters(ensembl)
This seems to give the genes only specifically annotated to the given GO term, and not any genes associated with the child terms. Mostly one is interested in ALL the genes for a GO term, i.e, with both direct and indirect annotations.
Forgive me if I misunderstand something here.
According to GO.db, GO:1903452 should be children of GO:1903450 and itself have no children, however, I get nothing from
while I can retrieve 25 rows of RAB11FIP4 belonging to different go_id.
And None of these go_id is ancestor of GO:1903452
so what is going on here? And how could I know if I REALLY retrieve ALL genes associated with a certain GO term and nothing else?
Hi ZeroFung,
I recently encountered a similiar issue to you where a lot of the tools that I tried did not capture the genes present in the child terms. What I ended up doing is: 1) downloading the GO terms with their corresponding gene names from Ensembl's biomart 2) Loading this into R as a dataframe along with the package GO.db 3) Using GO.db's GOBPOFFSPRING function to pull all of the child terms
This can then be used to filter your ensembl downloaded GO terms to get all of the genes in your GO term and the child GO terms.
how do I get only the BP goterms? I am only interested in deriving the Biological process goterms given a gene ID say 6713?
Have you tried 'prepare_database' from GeneSCF.
Hello EagleEye, no I did not try that out.