Retrieve Vectorbase ID from NCBI ID [Solved]
0
0
Entering edit mode
3.6 years ago
benoahb ▴ 40

Hi,

I have about a hundred proteins/genes of Aedes albopictus for which I have the NCBI protein/mRNA RefSeq and Gene Symbol. I would like to retrieve the proper Vectorbase ID from them. In most cases, the Vectorbase entry is also associated with a Uniprot/SPTREMBL entry if that helps.

I tried to use https://biodbnet-abcc.ncifcrf.gov/db/db2dbRes.php or https://david.ncifcrf.gov/tools.jsp without success.

Additionally, I would also like to follow up by finding their ortholog in Aedes aegypti.

For example, RER1 is referenced on Vectorbase here:

  • AALF026993 (albopictus) where it is linked to LOC109622239 (albopictus) and its ortholog AAEL019802 (aegypti)
  • LOC109622239 (albopictus) where it is linked to XP_019932051 (albopictus) and its ortholog AAEL019802 (aegypti)

I can't figure out a way to connect the dots from one to the other. Any clue how I should proceed?

EDIT: I found a solution on Vectorbase: Searches /> Genes -> Annotations -> geneIDs -> step2: Orthologs.

refseq identifier vectorbase database • 1.7k views
ADD COMMENT
1
Entering edit mode

Since Vectorbase is an external site unless they link the data back to NCBI you are not likely to have the necessary information you are looking for.

Using EntrezDirect you can get the gene homologs in Aedes by following:

$ esearch -db gene -query "RER1" | efetch -format ft | grep -B 1 -A 3 Aedes
85. LOC109622239
protein RER1 [Aedes albopictus (Asian tiger mosquito)]
Other Designations: protein RER1
Chromosome: Un
ID: 109622239
--
--
164. LOC5573266
protein RER1 [Aedes aegypti (yellow fever mosquito)]
Other Aliases: AaeL_AAEL010361, AAEL010361
Other Designations: protein RER1
Chromosome: 3
--
--
128. LOC109431573
protein RER1 [Aedes albopictus (Asian tiger mosquito)]
This record was replaced with GeneID: 109622239
ID: 109431573

Starting with protein ID's you could do the following

$ esearch -db protein -query "XP_019932051" | elink -target gene | efetch -format ft

1. LOC109622239
protein RER1 [Aedes albopictus (Asian tiger mosquito)]
Other Designations: protein RER1
Chromosome: Un
ID: 109622239
ADD REPLY
0
Entering edit mode

Alright, that is a little bit helpful. Unfortunately, I tried a few and not all entries from NCBI are linked to their Vectorbase counterpart, and sometime the gene do not match any gene from either aegypti or albopictus.

I guess that's the price of studying models with poor annotation.

I'll wait to see whether someone comes with a better idea before searching each item one by one.

Actually, would you know a way to stream line this command for several entries in parallel?

ADD REPLY
0
Entering edit mode

Can you post a few example ID's?

You are going to need to blast proteins of albopictus against aegypti to get the homologs. No way around that.

ADD REPLY
0
Entering edit mode

For example I tried protein SMG5-like/XP_029724968, galectin-6/XP_019541205 or annulin-like/XP_019561751.2

I get a LOCxxxxxx ID for both aegypti and albo but no Vectorbase reference. It's already a progress but it looks like a long night is ahead of me.

ADD REPLY
0
Entering edit mode

For multiple ID's you can use epost method, one ID per line.

$ more id
XP_029724968
XP_019541205
XP_019561751.2

$ cat id | epost -db protein -format acc | elink -target gene | efetch -format ft


1. LOC109412016
galectin-6 [Aedes albopictus (Asian tiger mosquito)]
Other Designations: galectin-6
Chromosome: Un
ID: 109412016

2. LOC109621961
protein SMG5-like [Aedes albopictus (Asian tiger mosquito)]
Other Designations: protein SMG5-like
Chromosome: Un
ID: 109621961

3. LOC109430163
annulin-like [Aedes albopictus (Asian tiger mosquito)]
Other Designations: annulin-like
Chromosome: Un
ID: 109430163
ADD REPLY
0
Entering edit mode

Thank you for your help!

No chance to do the same for this command line?

$ esearch -db gene -query "RER1" | efetch -format ft | grep -B 1 -A 3 Aedes
ADD REPLY
1
Entering edit mode
$ cat id
SMG5
galectin-6
annulin


$ for i in `cat id`; do printf "${i}\n\n"; esearch -db gene -query ${i} | efetch -format ft | grep -B 1 -A 3 "Aedes"; printf "\n\n"; done
SMG5

265. LOC5565770
protein SMG5 [Aedes aegypti (yellow fever mosquito)]
Other Aliases: AaeL_AAEL004986, AAEL004986
Other Designations: protein SMG5
Chromosome: 2
--
--
426. LOC115267272
protein SMG5-like [Aedes albopictus (Asian tiger mosquito)]
Chromosome: Un
ID: 115267272

--
--
62. LOC109621961
protein SMG5-like [Aedes albopictus (Asian tiger mosquito)]
Other Designations: protein SMG5-like
Chromosome: Un
ID: 109621961
--
--
71. LOC109415051
protein SMG5 [Aedes albopictus (Asian tiger mosquito)]
Other Designations: protein SMG5
Chromosome: Un
ID: 109415051
--
--
377. LOC5569201
muscle M-line assembly protein unc-89 [Aedes aegypti (yellow fever mosquito)]
Other Aliases: AaeL_AAEL007471, AAEL007471
Other Designations: muscle M-line assembly protein unc-89
Chromosome: 1


galectin-6

4. LOC109412016
galectin-6 [Aedes albopictus (Asian tiger mosquito)]
Other Designations: galectin-6
Chromosome: Un
ID: 109412016
--
--
78. LOC109418224
galectin-6-like [Aedes albopictus (Asian tiger mosquito)]
This record was replaced with GeneID: 109427989
ID: 109418224

--
--
79. LOC109407960
galectin-6-like [Aedes albopictus (Asian tiger mosquito)]
Other Designations: galectin-6-like
Chromosome: Un
ID: 109407960


annulin

2. LOC5568609
annulin [Aedes aegypti (yellow fever mosquito)]
Other Aliases: AaeL_AAEL006978, AAEL006978
Other Designations: annulin
Chromosome: 2
--
--
160. LOC110681051
annulin-like [Aedes aegypti (yellow fever mosquito)]
Other Designations: annulin-like
Chromosome: Un
ID: 110681051
--
--
161. LOC110676966
annulin-like [Aedes aegypti (yellow fever mosquito)]
Other Designations: annulin-like
Chromosome: 2
Annotation: Chromosome 2 NC_035108.1 (399149919..399155046, complement)
--
--
169. LOC109430163
annulin-like [Aedes albopictus (Asian tiger mosquito)]
Other Designations: annulin-like
Chromosome: Un
ID: 109430163
--
--
170. LOC109430162
annulin [Aedes albopictus (Asian tiger mosquito)]
Other Designations: LOW QUALITY PROTEIN: annulin
Chromosome: Un
ID: 109430162
ADD REPLY
0
Entering edit mode

FYI, I found another solution which is much more simple, see edit in my OP.

ADD REPLY

Login before adding your answer.

Traffic: 2894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6