Hi,
I have about a hundred proteins/genes of Aedes albopictus for which I have the NCBI protein/mRNA RefSeq and Gene Symbol. I would like to retrieve the proper Vectorbase ID from them. In most cases, the Vectorbase entry is also associated with a Uniprot/SPTREMBL entry if that helps.
I tried to use https://biodbnet-abcc.ncifcrf.gov/db/db2dbRes.php or https://david.ncifcrf.gov/tools.jsp without success.
Additionally, I would also like to follow up by finding their ortholog in Aedes aegypti.
For example, RER1 is referenced on Vectorbase here:
- AALF026993 (albopictus) where it is linked to LOC109622239 (albopictus) and its ortholog AAEL019802 (aegypti)
- LOC109622239 (albopictus) where it is linked to XP_019932051 (albopictus) and its ortholog AAEL019802 (aegypti)
I can't figure out a way to connect the dots from one to the other. Any clue how I should proceed?
EDIT: I found a solution on Vectorbase: Searches /> Genes -> Annotations -> geneIDs -> step2: Orthologs.
Since Vectorbase is an external site unless they link the data back to NCBI you are not likely to have the necessary information you are looking for.
Using EntrezDirect you can get the gene homologs in
Aedes
by following:Starting with protein ID's you could do the following
Alright, that is a little bit helpful. Unfortunately, I tried a few and not all entries from NCBI are linked to their Vectorbase counterpart, and sometime the gene do not match any gene from either aegypti or albopictus.
I guess that's the price of studying models with poor annotation.
I'll wait to see whether someone comes with a better idea before searching each item one by one.
Actually, would you know a way to stream line this command for several entries in parallel?
Can you post a few example ID's?
You are going to need to blast proteins of albopictus against aegypti to get the homologs. No way around that.
For example I tried protein SMG5-like/XP_029724968, galectin-6/XP_019541205 or annulin-like/XP_019561751.2
I get a LOCxxxxxx ID for both aegypti and albo but no Vectorbase reference. It's already a progress but it looks like a long night is ahead of me.
For multiple ID's you can use
epost
method, one ID per line.Thank you for your help!
No chance to do the same for this command line?
FYI, I found another solution which is much more simple, see edit in my OP.