I conducted RNA differential expression analysis for a non-model organism and obtained a list of significantly different genes identified by their NCBI RefSeq IDs. However, I encountered difficulties in annotating the gene functions due to the limited availability of gene data for this species. It lacks Ensembl data, and the content on its NCBI GenBank database is mostly predicted. I am unsure about the methods for annotating such genes.
I am sharing a portion of the significant genes for reference:
gene_id symbol description
122921688 LOC122921688 uncharacterized LOC122921688
122944421 LOC122944421 uncharacterized LOC122944421
122921852 LOC122921852 uncharacterized LOC122921852
122922584 LOC122922584 talin-1-like
122924299 LOC122924299 E3 ubiquitin-protein ligase TRIM39-like
122935854 SLC2A2 solute carrier family 2 member 2
122929373 LOC122929373 nuclear factor interleukin-3-regulated protein-like
122932721 SLC26A9 solute carrier family 26 member 9
122943126 MICALL1 MICAL like 1
122924116 LOC122924116 phospholipase A2 inhibitor and Ly6/PLAUR domain-containing protein-like
122937946 CDK14 cyclin dependent kinase 14
122922545 INHBB inhibin subunit beta B
122926835 LOC122926835 nicotinamide N-methyltransferase-like
122940764 LOC122940764 NACHT, LRR and PYD domains-containing protein 12-like
122944427 LOC122944427 uncharacterized LOC122944427
122935021 LOC122935021 uncharacterized LOC122935021
Are there any good ideas/tools/tutorials that I can follow for annotating genes like this?
Thank you for introducing me to such a fascinating tool - funannotate! Upon reading its manual, I discovered that the feature I require is Comparative Genomics, which compares the annotations of non-model genomes with other organisms such as human, mouse, etc.
However, I noticed that the comparative genomics function,
funannotate compare
, only accepts genomes annotated with "funannotate" (output from multiple funannotate annotate) as input. As I do not intend to go through the process of "multiple funannotate annotate" since NCBI has already predicted gene and annotated them for me, I am wondering if there are any methods or tools available that can perform the comparative analysis using NCBI annotations as input.Ok, sorry, I think I misunderstood your problem. So, you have annotations, but many of your putative proteins are "unknown". This is common in non-model organisms. You could use Pfam database Pfam database. That being said, if your organism is "super exotic", you may be quickly limited and end up with a significant number of putative proteins of unknown functions, anyway.