Hi all,
I am reanalyzing a published RNA-seq dataset of a non-model organism (eel) and I have run into a roadblock regarding gene identifiers and finding orthologs.
I generally use Ensembl biomart to find one2one orthologs between the query and target species (for eg: dog -> human/mouse), and then I use these target identifiers for performing GO/GSEA/IPA downstream of differential expression analysis. However, the current dataset is from eel which is not in Ensembl. I only have NCBI Gene IDs. My question is:
Given Gene IDs, how do I find species-specific orthologs for approx. 6000 genes (preferably by scripting) ? I would like to find the orthologs for the eel genes in zebrafish/human/mouse for GO/GSEA/IPA. I checked the gene_orthologs flat file at https://ftp.ncbi.nlm.nih.gov/gene/DATA/ but my taxonomy of interest (7936) is not present. While some eel genes have symbols that are identical to human/mouse, a lot of the genes in eel genome are annotated as LOC + gene id (eg: LOC118212896 ). The gene description has a "-like" suffix (eg:hexokinase-4-like), so there is high homology (but not enough for exact assignment) to other annotated genomes. If I can get the human/mouse/zebrafish ortholog gene symbols for these "-like" genes, that would let me do everything else I need to do downstream.
Thank you for any help !
Hi Clement, thanks for the tips ! I will try this and come back to accept the answer when it works.