Question

How to make a .gaf file for GO analysis

0

Entering edit mode

9 weeks ago

noodle ▴ 650

Hi Biostars,

I need to make a .gaf file for GO analysis. The genome I must use is an annoying one, https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/915/635/GCA_002915635.3_AmbMex60DD/

There is a .gaf file for a hybrid axolotl / tiger salamander, but I really need this for the AmbMex60DD assembly. Can anyone advise?

Thanks!

NGS GO • 638 views

ADD COMMENT • link 9 weeks ago by noodle ▴ 650

1

Entering edit mode

According to NCBI, that genome isn't annotated. Where did you get the transcriptome? Unless you want to do the GO annotations yourself, you might be better off either using the annotated UKY_AmexF1_1 assembly, or use an orthology based approach to assign GO terms.

ADD REPLY • link 9 weeks ago by dthorbur ★ 2.9k

0

Entering edit mode

The issue with 'UKY_AmexF1_1' is it's a hybrid. Are there any tools you're aware of for the orthology based approach?

The transcriptome annotations are available from; https://www.axolotl-omics.org/assemblies

ADD REPLY • link 9 weeks ago by noodle ▴ 650

0

Entering edit mode

To my knowledge there aren't tools specifically for orthology based GO enrichment analyses. But it would look something like this:

Use a tool like OrthoFinder to map orthologs between your assembly and one with well annotated GO terms.
Construct your background and test gene sets from your experiment.
Test for GO enrichment. Tools like gProfiler2 are good for non-model systems.

You can also annotate the published transcriptome with GO terms. This usually isn't that painful to do if you have access to a good computing resource. You then wouldn't have to deal with considerations about mapping rates and many-to-1 or many-to-many orthology assignments that could cloud your results.

ADD REPLY • link 9 weeks ago by dthorbur ★ 2.9k

score 1 · Answer 1 · 2025-01-28

1

Entering edit mode

9 weeks ago

geneontologyhelp ▴ 470

Is there any way you'd be able to use the existing GAF for UKY_AmexF1_1? That is the reference genome for Ambystoma mexicanum and is what GO recommends (https://geneontology.org/docs/download-go-annotations/#2-all-other-organisms):

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/040/938/575/GCF_040938575.1_UKY_AmexF1_1/GCF_040938575.1-RS_2024_10_gene_ontology.gaf.gz.

Edit: I see this is the tigrinum cross you mentioned.

If you absolutely must generate novel annotations, InterProScan is what we recommend for the best results. Please let us (GO) know if neither of these approaches works for your research.

ADD COMMENT • link 9 weeks ago by geneontologyhelp ▴ 470

0

Entering edit mode

Hi GO help, thank you very much for the response and the direction. I might have lost my mind, but UKY_AmexF1_1 was previously listed on NCBI as a haploid 'Ambystoma mexicanum x Ambystoma tigrinum' hybrid, but for some reason is now listed as only 'Ambystoma mexicanum' and the haploid annotation was also lost, however in the comments section the tigrinum is still mentioned. I much prefer to work on the AmbMex60DD annotation. I'll try to generate novel annotations with InterProScan and get back to you if I have further questions.

ADD REPLY • link 9 weeks ago by noodle ▴ 650

0

Entering edit mode

Running a test job of 100 proteins, it took 160 seconds to complete with 16 cores ...does that number scale linearly? So I should expect a job with my full list of 99,754 to run for ~44 hours?

ADD REPLY • link 9 weeks ago by noodle ▴ 650