I have the mapping of every gene in my organism to its GO id. Some genes doesn't have a GO id also. Below is an example of the format I have.
SeqName length score eValue hitName GOs ACC
Anisa00001 1185 656.751 0.0 gi|498338734|ref|WP_010652890.1|MULTISPECIES: integrase [Legionellaceae] GO:0015074,GO:0006310,GO:0003677 WP_010652890.1,YP_006506632.1,CCD06721.1,ETO94118.1
Anisa00001 1185 466.463 6.16098E-161 gi|493924933|ref|WP_006869768.1|integrase [Legionella drancourtii] GO:0015074,GO:0006310,GO:0003677 WP_006869768.1,EHL32123.1
Anisa00001 1185 424.861 1.43813E-144 gi|502743862|ref|WP_012978846.1|integrase [Legionella longbeachae] GO:0015074,GO:0006310,GO:0003677 WP_012978846.1,YP_003454139.1,CBJ10990.1
Anisa00001 1185 423.705 4.53406E-144 gi|499535659|ref|WP_011216442.1|integrase [Legionella pneumophila] GO:0015074,GO:0006310,GO:0003677 WP_011216442.1,YP_127815.1,CAH16726.1
Anisa00001 1185 419.468 2.18018E-142 gi|570283998|gb|AHE66185.1|site-specific recombinase XerD [Legionella oakridgensis ATCC 33761 = DSM 21215] GO:0015074,GO:0006310,GO:0003677 AHE66185.1,ETO93999.1
Anisa00001 1185 413.69 4.08724E-140 gi|499526807|ref|WP_011213447.1|integrase [Legionella pneumophila] GO:0015074,GO:0006310,GO:0003677 WP_011213447.1,YP_123415.1,CAH12239.1,ERB41210.1,ERH42153.1,ERI47405.1
Anisa00001 1185 413.69 6.12215E-140 gi|504092896|ref|WP_014326890.1|integrase [Legionella pneumophila] GO:0015074,GO:0006310,GO:0003677 WP_014326890.1,YP_005185414.1,AEW51315.1
Anisa00001 1185 409.068 2.69005E-138 gi|499533894|ref|WP_011215175.1|integrase [Legionella pneumophila] GO:0015074,GO:0006310,GO:0003677 WP_011215175.1,YP_126420.1,YP_006508280.1,CAH15303.1,CCD08390.1,KGP62973.1
Anisa00001 1185 404.831 1.16647E-136 gi|698843809|emb|CEG57778.1|Phage integrase family site-specific recombinase [Legionella fallonii LLAP-10] GO:0015074,GO:0006310,GO:0003677 CEG57778.1
Anisa00003 297 176.022 7.21027E-56 gi|498338732|ref|WP_010652888.1|MULTISPECIES: hypothetical protein [Legionellaceae] GO:0043565,GO:0003677 WP_010652888.1,YP_006506634.1,CCD06723.1,ETO94116.1
Anisa00003 297 120.168 9.08058E-34 gi|698843811|emb|CEG57780.1|conserved protein of unknown function [Legionella fallonii LLAP-10] GO:0043565,GO:0003677 CEG57780.1
Anisa00003 297 93.9745 3.23067E-23 gi|493924102|ref|WP_006868999.1|hypothetical protein [Legionella drancourtii] GO:0043565,GO:0003677 WP_006868999.1,EHL32770.1
Anisa00003 297 80.1073 5.8726E-18 gi|447092960|ref|WP_001170216.1|hypothetical protein [Leptospira interrogans] GO:0043565,GO:0003677 WP_001170216.1,EMM81225.1
Anisa00003 297 78.1814 4.04598E-17 gi|489065186|ref|WP_002975201.1|DNA-binding helix-turn-helix protein [Leptospira terpstrae] GO:0043565,GO:0003677 WP_002975201.1,EMY59958.1
Anisa00003 297 77.7962 5.73396E-17 gi|505585864|ref|WP_015678427.1|DNA-binding helix-turn-helix protein [Leptospira yanagawae] GO:0043565,GO:0003677 WP_015678427.1,EOQ87907.1
Anisa00003 297 76.2554 2.31193E-16 gi|523642128|ref|WP_020778299.1|DNA-binding helix-turn-helix protein [Leptospira meyeri] GO:0043565,GO:0003677 WP_020778299.1,EMJ85365.1
Anisa00003 297 75.8702 3.27591E-16 gi|489067540|ref|WP_002977532.1|DNA-binding helix-turn-helix protein [Leptospira vanthielii] GO:0043565,GO:0003677 WP_002977532.1,EMY71198.1
Anisa00003 297 75.485 3.86738E-16 gi|490606321|ref|WP_004471330.1|MULTISPECIES: Cro/C1-type HTH DNA-binding domain protein [Leptospira] GO:0043565,GO:0003677 WP_004471330.1,EKO33151.1,EKO78480.1,EMI68067.1,EMN22335.1,EMO21363.1
Anisa00003 297 75.8702 5.5926E-16 gi|488857175|ref|WP_002769485.1|hypothetical protein [Leptonema illini] GO:0043565,GO:0003677 WP_002769485.1,EHQ05131.1
Anisa00003 297 75.0998 6.57594E-16 gi|501452877|ref|WP_012476326.1|hypothetical protein [Leptospira biflexa] GO:0043565,GO:0003677 WP_012476326.1,YP_001963262.1,ABZ94684.1
Anisa00003 297 73.559 1.84265E-15 gi|685200322|gb|AIN94425.1|hypothetical protein JO40_10230 [Treponema putidum] GO:0003677 AIN94425.1
Anisa00004 1233 466.463 1.24221E-160 gi|502743845|ref|WP_012978829.1|outer membrane-specific lipoprotein transporter subunit ; membrane component of ABC superfamily [Legionella longbeachae] GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886 WP_012978829.1,YP_003454078.1,CBJ10928.1
Anisa00004 1233 462.996 2.88424E-159 gi|698844250|emb|CEG58219.1|Lipoprotein-releasing system transmembrane protein LolC [Legionella fallonii LLAP-10] GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886 CEG58219.1
Anisa00004 1233 459.529 6.69725E-158 gi|570285197|gb|AHE67384.1|lipoprotein releasing system, transmembrane protein, LolC/E family [Legionella oakridgensis ATCC 33761 = DSM 21215] GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886 AHE67384.1,ETO93044.1
Anisa00004 1233 453.751 1.30772E-155 gi|504657357|ref|WP_014844459.1|outer membrane-specific lipoprotein ABC transporter permease [Legionella pneumophila] GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886 WP_014844459.1,YP_006509488.1,CCD09630.1
Anisa00004 1233 452.981 2.6206E-155 gi|506459505|ref|WP_015961405.1|hypothetical protein [Legionella pneumophila] GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886 WP_015961405.1,YP_124551.1,CAH13391.1,ERB41693.1,ERH43995.1,ERI49100.1
Anisa00004 1233 452.21 5.28713E-155 gi|499535375|ref|WP_011216187.1|hypothetical protein [Legionella pneumophila] GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886 WP_011216187.1,YP_127546.1,CAH16451.1
What I want to do is enrichment analysis for a subset of genes present in the above and see which GO terms are enriched in that subset of genes. Basically I am looking for Fisher's exact test or Hypergeometric Test. Is there any program that any one could suggest?
Do you have a text file or something that we can read, it is almost impossible to read the format here and it make things complicated
Hi,
You can use GO enrichment tool at PantherDB.org. Please check their paper in Nature Protocols (http://www.nature.com/nprot/journal/v8/n8/full/nprot.2013.092.html) on how to prepare the input file etc.
R