Question

Collecting Genes with similar GO term

0

Entering edit mode

8.4 years ago

Farbod ★ 3.4k

Hi, I want to collect a list of all gene names (in human or in mouse) that have a similar GO term (e.g. DNA methylation).

please help me in this regard. Thanks

sequence gene • 4.6k views

ADD COMMENT • link updated 8.4 years ago by EagleEye 7.6k • written 8.4 years ago by Farbod ★ 3.4k

0

Entering edit mode

Why do you want to do this?
What have you tried?

ADD REPLY • link 8.4 years ago by Ram 44k

0

Entering edit mode

Dear Ram, I have a huge number of transcripts that has gained from a de novo RNA-seq experiment and I want check for example all the genes with GO term = reproduction or DNA methylation expression between my treatments. (it is similar to not the DEG analysis but not exactly similar).

ADD REPLY • link 8.4 years ago by Farbod ★ 3.4k

0

Entering edit mode

I think you could use GO annotation tools like Blast2GO to annotate ALL your transcripts and then filter them by any criteria.

ADD REPLY • link 8.4 years ago by Ram 44k

0

Entering edit mode

Hi, No, it takes several months to annotate 500000 transcriptomes with Blast2GO! and I do not want to run useless and resource-occupier whole transcriptome annotation. besides, If you have for example a list for "GO = reproduction" of human genes and your species of interest is a non-human one (bird or reptile or fish) you can check if those reproduction related genes of human exist in your species or not, and if yes, how is their expression between sexes in your species!

ADD REPLY • link 8.4 years ago by Farbod ★ 3.4k

0

Entering edit mode

Well, "a huge number of transcripts" is not the same as 500,000 whole transcriptomes now, is it? Also, are these 500K different species? If not, would you not need to just annotate one transcriptome per species cluster and compare other cluster members to find similar transcripts (and then maybe re-BLAST non-matching transcripts)? Why do you need to annotate all 500K transcriptomes?

ADD REPLY • link 8.4 years ago by Ram 44k

0

Entering edit mode

Thanks, I just simply need a way to collect all genes (or list of names) of one special GO term!

ADD REPLY • link 8.4 years ago by Farbod ★ 3.4k

0

Entering edit mode

Do you want to do this independent of your own data? That is what it would seem based on the way you have phrased this question.

Edit: You just answered my question. See the answer provided below.

ADD REPLY • link 8.4 years ago by GenoMax 147k

score 3 · Answer 1 · 2016-07-11

Try this tool,

Gene Set Clustering based on Functional annotation (GeneSCF)

I will use example for Mus musculus assuming you got Entrez geneids,

Two step process,

Downloading current available database for Mus Musculus from Gene Ontology

./prepare_database -db=GO_all -org=mgi

The above command downloads complete GO db as simple text file in following location, 'geneSCF-tool/class/lib/db/mgi/'.

Gene Ontology - Biological Process

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_BP -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

Gene Ontology - Cellular Component

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_CC -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

Gene Ontology - Molecular Function

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_MF -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

Gene Ontology - Complete (BP+CC+MF)

./geneSCF -m=normal -i=INPUTgene.listt -t=gid -db=GO_all -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

The results for enrichment analysis can be found in folder 'ExistingOUTPUTfolder'.

Single step process,

Gene Ontology - Biological Process (Downloading current available database for Mus Musculus from Gene Ontology + enrichment analysis)

./geneSCF -m=update -i=INPUTgene.list -t=gid -db=GO_BP -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

The above command downloads complete GO db as simple text file in following location, 'geneSCF-tool/class/lib/db/mgi/' and also do enrichment analysis parallel. The results for enrichment analysis can be found in folder 'ExistingOUTPUTfolder'.

No need for running update mode for consecutive runs since GO database for Mus musculus got updated when you use 'update' mode on first run.

Gene Ontology - Cellular Component

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_CC -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

Gene Ontology - Molecular Function

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_MF -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

Gene Ontology - Complete (BP+CC+MF)

./geneSCF -m=normal -i=INPUTgene.list -t=gid -db=GO_all -o=/ExistingOUTPUTfolder/ -org=mgi --plot=yes --background=15000

The above mentioned parameters should be changed according to your data (following can be altered),

-t=sym (for Gene Symbol as input list)

-t=gid (for Entrez Geneid as input list)

--background=#NUM (Use the total number of background genes from your dataset, example you can use total number of protein coding genes with detectable expression level irrespective of their significance or if it is transcriptome/Genome wide study you can use total number of annotated protein coding genes as background)

More information please refer documentation, http://genescf.kandurilab.org/documentation.php

score 1 · Answer 2 · 2016-07-11

clusterProfiler can do this.

Below is an example of finding all genes of human that annotated by GO:0006306, DNA methylation.

If you also want to find genes that annotated by similar GO terms, you can use GOSemSim to find similar GO terms and then also use bitr to map GO terms to gene IDs.

require(clusterProfiler)
X <- bitr("GO:0006306", fromType="GOALL", toType="ENTREZID", OrgDb='org.Hs.eg.db')

> head(X)
       GOALL EVIDENCEALL ONTOLOGYALL ENTREZID
1 GO:0006306         TAS          BP      546
2 GO:0006306         IEA          BP      672
3 GO:0006306         IEA          BP     1786
4 GO:0006306         TAS          BP     1786
5 GO:0006306         TAS          BP     1787
6 GO:0006306         IDA          BP     1788
> unique(X$ENTREZID)
 [1] "546"    "672"    "1786"   "1787"   "1788"   "1789"   "2146"   "2353"  
 [9] "2778"   "2932"   "3020"   "3021"   "4152"   "4204"   "4255"   "4297"  
[17] "4552"   "5290"   "6688"   "8294"   "8350"   "8351"   "8352"   "8353"  
[25] "8354"   "8355"   "8356"   "8357"   "8358"   "8359"   "8360"   "8361"  
[33] "8362"   "8363"   "8364"   "8365"   "8366"   "8367"   "8368"   "8370"  
[41] "8468"   "8968"   "9219"   "9463"   "10155"  "10419"  "10664"  "10919" 
[49] "11022"  "11176"  "29128"  "29947"  "51409"  "53615"  "54069"  "54456" 
[57] "54496"  "54514"  "54737"  "54815"  "55124"  "55729"  "55904"  "55929" 
[65] "56165"  "57459"  "57673"  "63978"  "79813"  "79977"  "80312"  "84944" 
[73] "91646"  "121504" "122402" "126961" "132243" "136991" "140690" "143689"
[81] "163589" "200424" "201164" "221656" "333932" "346171" "359787" "554313"
[89] "653604"