Hi, I want to collect a list of all gene names (in human or in mouse) that have a similar GO term (e.g. DNA methylation).
please help me in this regard. Thanks
Hi, I want to collect a list of all gene names (in human or in mouse) that have a similar GO term (e.g. DNA methylation).
please help me in this regard. Thanks
Try this tool,
Gene Set Clustering based on Functional annotation (GeneSCF)
I will use example for Mus musculus assuming you got Entrez geneids,
Two step process,
Downloading current available database for Mus Musculus from Gene Ontology
./prepare_database -db=GO_all -org=mgi
The above command downloads complete GO db as simple text file in following location, 'geneSCF-tool/class/lib/db/mgi/'.
Gene Ontology - Biological Process
./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_BP -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000
Gene Ontology - Cellular Component
./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_CC -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000
Gene Ontology - Molecular Function
./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_MF -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000
Gene Ontology - Complete (BP+CC+MF)
./geneSCF -m=normal -i=INPUTgene.listt -t=gid -db=GO_all -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000
The results for enrichment analysis can be found in folder 'ExistingOUTPUTfolder'.
Single step process,
Gene Ontology - Biological Process (Downloading current available database for Mus Musculus from Gene Ontology + enrichment analysis)
./geneSCF -m=update -i=INPUTgene.list -t=gid -db=GO_BP -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000
The above command downloads complete GO db as simple text file in following location, 'geneSCF-tool/class/lib/db/mgi/' and also do enrichment analysis parallel. The results for enrichment analysis can be found in folder 'ExistingOUTPUTfolder'.
No need for running update mode for consecutive runs since GO database for Mus musculus got updated when you use 'update' mode on first run.
Gene Ontology - Cellular Component
./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_CC -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000
Gene Ontology - Molecular Function
./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_MF -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000
Gene Ontology - Complete (BP+CC+MF)
./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_all -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000
The above mentioned parameters should be changed according to your data (following can be altered),
-t=sym (for Gene Symbol as input list)
-t=gid (for Entrez Geneid as input list)
--background=#NUM (Use the total number of background genes from your dataset, example you can use total number of protein coding genes with detectable expression level irrespective of their significance or if it is transcriptome/Genome wide study you can use total number of annotated protein coding genes as background)
More information please refer documentation, http://genescf.kandurilab.org/documentation.php
clusterProfiler can do this.
Below is an example of finding all genes of human that annotated by GO:0006306, DNA methylation.
If you also want to find genes that annotated by similar GO terms, you can use GOSemSim to find similar GO terms and then also use bitr
to map GO terms to gene IDs.
require(clusterProfiler) X <- bitr("GO:0006306", fromType="GOALL", toType="ENTREZID", OrgDb='org.Hs.eg.db')
> head(X) GOALL EVIDENCEALL ONTOLOGYALL ENTREZID 1 GO:0006306 TAS BP 546 2 GO:0006306 IEA BP 672 3 GO:0006306 IEA BP 1786 4 GO:0006306 TAS BP 1786 5 GO:0006306 TAS BP 1787 6 GO:0006306 IDA BP 1788 > unique(X$ENTREZID) [1] "546" "672" "1786" "1787" "1788" "1789" "2146" "2353" [9] "2778" "2932" "3020" "3021" "4152" "4204" "4255" "4297" [17] "4552" "5290" "6688" "8294" "8350" "8351" "8352" "8353" [25] "8354" "8355" "8356" "8357" "8358" "8359" "8360" "8361" [33] "8362" "8363" "8364" "8365" "8366" "8367" "8368" "8370" [41] "8468" "8968" "9219" "9463" "10155" "10419" "10664" "10919" [49] "11022" "11176" "29128" "29947" "51409" "53615" "54069" "54456" [57] "54496" "54514" "54737" "54815" "55124" "55729" "55904" "55929" [65] "56165" "57459" "57673" "63978" "79813" "79977" "80312" "84944" [73] "91646" "121504" "122402" "126961" "132243" "136991" "140690" "143689" [81] "163589" "200424" "201164" "221656" "333932" "346171" "359787" "554313" [89] "653604"
OrgDb='org.Hs.eg.db'
org.Hs.eg.db is for human and org.Mm.eg.db is for mouse.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Dear Ram, I have a huge number of transcripts that has gained from a de novo RNA-seq experiment and I want check for example all the genes with GO term = reproduction or DNA methylation expression between my treatments. (it is similar to not the DEG analysis but not exactly similar).
I think you could use GO annotation tools like Blast2GO to annotate ALL your transcripts and then filter them by any criteria.
Hi, No, it takes several months to annotate 500000 transcriptomes with Blast2GO! and I do not want to run useless and resource-occupier whole transcriptome annotation. besides, If you have for example a list for "GO = reproduction" of human genes and your species of interest is a non-human one (bird or reptile or fish) you can check if those reproduction related genes of human exist in your species or not, and if yes, how is their expression between sexes in your species!
Well, "a huge number of transcripts" is not the same as 500,000 whole transcriptomes now, is it? Also, are these 500K different species? If not, would you not need to just annotate one transcriptome per species cluster and compare other cluster members to find similar transcripts (and then maybe re-BLAST non-matching transcripts)? Why do you need to annotate all 500K transcriptomes?
Thanks, I just simply need a way to collect all genes (or list of names) of one special GO term!
Do you want to do this independent of your own data? That is what it would seem based on the way you have phrased this question.
Edit: You just answered my question. See the answer provided below.