Collecting Genes with similar GO term
2
0
Entering edit mode
8.4 years ago
Farbod ★ 3.4k

Hi, I want to collect a list of all gene names (in human or in mouse) that have a similar GO term (e.g. DNA methylation).

please help me in this regard. Thanks

sequence gene • 4.6k views
ADD COMMENT
0
Entering edit mode
  • Why do you want to do this?
  • What have you tried?
ADD REPLY
0
Entering edit mode

Dear Ram, I have a huge number of transcripts that has gained from a de novo RNA-seq experiment and I want check for example all the genes with GO term = reproduction or DNA methylation expression between my treatments. (it is similar to not the DEG analysis but not exactly similar).

ADD REPLY
0
Entering edit mode

I think you could use GO annotation tools like Blast2GO to annotate ALL your transcripts and then filter them by any criteria.

ADD REPLY
0
Entering edit mode

Hi, No, it takes several months to annotate 500000 transcriptomes with Blast2GO! and I do not want to run useless and resource-occupier whole transcriptome annotation. besides, If you have for example a list for "GO = reproduction" of human genes and your species of interest is a non-human one (bird or reptile or fish) you can check if those reproduction related genes of human exist in your species or not, and if yes, how is their expression between sexes in your species!

ADD REPLY
0
Entering edit mode

Well, "a huge number of transcripts" is not the same as 500,000 whole transcriptomes now, is it? Also, are these 500K different species? If not, would you not need to just annotate one transcriptome per species cluster and compare other cluster members to find similar transcripts (and then maybe re-BLAST non-matching transcripts)? Why do you need to annotate all 500K transcriptomes?

ADD REPLY
0
Entering edit mode

Thanks, I just simply need a way to collect all genes (or list of names) of one special GO term!

ADD REPLY
0
Entering edit mode

Do you want to do this independent of your own data? That is what it would seem based on the way you have phrased this question.

Edit: You just answered my question. See the answer provided below.

ADD REPLY
3
Entering edit mode
8.4 years ago
EagleEye 7.6k

Try this tool,

Gene Set Clustering based on Functional annotation (GeneSCF)

I will use example for Mus musculus assuming you got Entrez geneids,

Two step process,

Downloading current available database for Mus Musculus from Gene Ontology

./prepare_database -db=GO_all -org=mgi

The above command downloads complete GO db as simple text file in following location, 'geneSCF-tool/class/lib/db/mgi/'.

Gene Ontology - Biological Process

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_BP -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

Gene Ontology - Cellular Component

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_CC -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

Gene Ontology - Molecular Function

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_MF -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

Gene Ontology - Complete (BP+CC+MF)

./geneSCF -m=normal -i=INPUTgene.listt -t=gid -db=GO_all -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

The results for enrichment analysis can be found in folder 'ExistingOUTPUTfolder'.


Single step process,

Gene Ontology - Biological Process (Downloading current available database for Mus Musculus from Gene Ontology + enrichment analysis)

./geneSCF -m=update -i=INPUTgene.list -t=gid -db=GO_BP -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

The above command downloads complete GO db as simple text file in following location, 'geneSCF-tool/class/lib/db/mgi/' and also do enrichment analysis parallel. The results for enrichment analysis can be found in folder 'ExistingOUTPUTfolder'.

No need for running update mode for consecutive runs since GO database for Mus musculus got updated when you use 'update' mode on first run.

Gene Ontology - Cellular Component

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_CC -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

Gene Ontology - Molecular Function

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_MF -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

Gene Ontology - Complete (BP+CC+MF)

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_all -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000


The above mentioned parameters should be changed according to your data (following can be altered),

-t=sym (for Gene Symbol as input list)

-t=gid (for Entrez Geneid as input list)

--background=#NUM (Use the total number of background genes from your dataset, example you can use total number of protein coding genes with detectable expression level irrespective of their significance or if it is transcriptome/Genome wide study you can use total number of annotated protein coding genes as background)

More information please refer documentation, http://genescf.kandurilab.org/documentation.php

ADD COMMENT
1
Entering edit mode

Since you are the author of that tool is it possible to provide a (or a set of) command line(s) to get what @Farbod wants? This can be helpful to someone finds this thread in future via search.

ADD REPLY
1
Entering edit mode

Sorry for that, I will also update GeneSCF page on Biostars with more examples soon. Thanks for your valuable suggestion.

ADD REPLY
1
Entering edit mode
8.4 years ago
Guangchuang Yu ★ 2.6k

clusterProfiler can do this.

Below is an example of finding all genes of human that annotated by GO:0006306, DNA methylation.

If you also want to find genes that annotated by similar GO terms, you can use GOSemSim to find similar GO terms and then also use bitr to map GO terms to gene IDs.


require(clusterProfiler)
X <- bitr("GO:0006306", fromType="GOALL", toType="ENTREZID", OrgDb='org.Hs.eg.db') 
> head(X)
       GOALL EVIDENCEALL ONTOLOGYALL ENTREZID
1 GO:0006306         TAS          BP      546
2 GO:0006306         IEA          BP      672
3 GO:0006306         IEA          BP     1786
4 GO:0006306         TAS          BP     1786
5 GO:0006306         TAS          BP     1787
6 GO:0006306         IDA          BP     1788
> unique(X$ENTREZID)
 [1] "546"    "672"    "1786"   "1787"   "1788"   "1789"   "2146"   "2353"  
 [9] "2778"   "2932"   "3020"   "3021"   "4152"   "4204"   "4255"   "4297"  
[17] "4552"   "5290"   "6688"   "8294"   "8350"   "8351"   "8352"   "8353"  
[25] "8354"   "8355"   "8356"   "8357"   "8358"   "8359"   "8360"   "8361"  
[33] "8362"   "8363"   "8364"   "8365"   "8366"   "8367"   "8368"   "8370"  
[41] "8468"   "8968"   "9219"   "9463"   "10155"  "10419"  "10664"  "10919" 
[49] "11022"  "11176"  "29128"  "29947"  "51409"  "53615"  "54069"  "54456" 
[57] "54496"  "54514"  "54737"  "54815"  "55124"  "55729"  "55904"  "55929" 
[65] "56165"  "57459"  "57673"  "63978"  "79813"  "79977"  "80312"  "84944" 
[73] "91646"  "121504" "122402" "126961" "132243" "136991" "140690" "143689"
[81] "163589" "200424" "201164" "221656" "333932" "346171" "359787" "554313"
[89] "653604"
ADD COMMENT
1
Entering edit mode

Dear Guangchuang Yu, Hi and thank you. but it seems that in the "taxon" part there are other species than Human. I will install your package and may be ask some other question in future.

ADD REPLY
0
Entering edit mode

OrgDb='org.Hs.eg.db'

org.Hs.eg.db is for human and org.Mm.eg.db is for mouse.

ADD REPLY

Login before adding your answer.

Traffic: 1777 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6