How to annotate function of genes from non-model organism
2
0
Entering edit mode
13 months ago
Maxine ▴ 50

I conducted RNA differential expression analysis for a non-model organism and obtained a list of significantly different genes identified by their NCBI RefSeq IDs. However, I encountered difficulties in annotating the gene functions due to the limited availability of gene data for this species. It lacks Ensembl data, and the content on its NCBI GenBank database is mostly predicted. I am unsure about the methods for annotating such genes.

I am sharing a portion of the significant genes for reference:

gene_id symbol  description
122921688   LOC122921688    uncharacterized LOC122921688
122944421   LOC122944421    uncharacterized LOC122944421
122921852   LOC122921852    uncharacterized LOC122921852
122922584   LOC122922584    talin-1-like
122924299   LOC122924299    E3 ubiquitin-protein ligase TRIM39-like
122935854   SLC2A2  solute carrier family 2 member 2
122929373   LOC122929373    nuclear factor interleukin-3-regulated protein-like
122932721   SLC26A9 solute carrier family 26 member 9
122943126   MICALL1 MICAL like 1
122924116   LOC122924116    phospholipase A2 inhibitor and Ly6/PLAUR domain-containing protein-like
122937946   CDK14   cyclin dependent kinase 14
122922545   INHBB   inhibin subunit beta B
122926835   LOC122926835    nicotinamide N-methyltransferase-like
122940764   LOC122940764    NACHT, LRR and PYD domains-containing protein 12-like
122944427   LOC122944427    uncharacterized LOC122944427
122935021   LOC122935021    uncharacterized LOC122935021

Are there any good ideas/tools/tutorials that I can follow for annotating genes like this?

gene annotation • 1.2k views
ADD COMMENT
2
Entering edit mode
13 months ago

Yes this is a common problem when you work with less-well-funded species, many proteins have no annotated function. You usually end up lifting over annotations from close relatives with varying levels of success.

A few tools exist that let you know more about your proteins: for example, you can look at Pfam protein domains. That will let you know whether your protein of interest, for example, is a Kinase. There are some tools which combine these protein domains into functional sets, for example, RGAugury looks at Pfam domains shared by plant disease resistance genes. Interproscan is the most commonly used tool for Pfam domain prediction.

You can also look at Gene Ontology annotation, sometimes you can get a hit there. PANNZER2 seems to run well with non-model organisms: http://ekhidna2.biocenter.helsinki.fi/sanspanz/ other tools like BLAST2GO lift over from BLAST-hits. Then you might get some GO-terms for your candidates, but there are obviously limits for this.

There are some pipelines which run a bunch of these functional annotation tools like FA-NF https://github.com/guigolab/FA-nf

ADD COMMENT
0
Entering edit mode
13 months ago
Axzd ▴ 80

A pipeline that works quite ok is funannotate https://github.com/nextgenusfs/funannotate .

ADD COMMENT
0
Entering edit mode

Thank you for introducing me to such a fascinating tool - funannotate! Upon reading its manual, I discovered that the feature I require is Comparative Genomics, which compares the annotations of non-model genomes with other organisms such as human, mouse, etc.

However, I noticed that the comparative genomics function, funannotate compare, only accepts genomes annotated with "funannotate" (output from multiple funannotate annotate) as input. As I do not intend to go through the process of "multiple funannotate annotate" since NCBI has already predicted gene and annotated them for me, I am wondering if there are any methods or tools available that can perform the comparative analysis using NCBI annotations as input.

ADD REPLY
0
Entering edit mode

Ok, sorry, I think I misunderstood your problem. So, you have annotations, but many of your putative proteins are "unknown". This is common in non-model organisms. You could use Pfam database Pfam database. That being said, if your organism is "super exotic", you may be quickly limited and end up with a significant number of putative proteins of unknown functions, anyway.

ADD REPLY

Login before adding your answer.

Traffic: 1713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6