Hi,
I'm working on a project in which I am interested to know where the proteins for which I have a nucleotide sequence in one fish species are found (chromosome and position) on the Danio rerio (zebrafish) genome. I blast my sequences against the Danio rerio transcriptome, extracted from the 'nr' database and, I then get geneIDs in the following format:
gi|47087391|ref|NP_998590.1|
gi|56090491|ref|NP_001007792.1|
gi|169154248|emb|CAQ15172.1|
gi|189523697|ref|XP_001341635.2|
gi|189526610|ref|XP_687146.3|
From these, I would like to know the chromosome number and position on the chromosome of these genes on the Danio rerio genome (Zv9). Given that I have close to a thousand of these IDs, I want this process to be automated.
I can browse the zebrafish genome on different genome browsers, but how can I automate my search?
Many thanks
If it's the nr database, then those are not "gene IDs". The gi is a unique identifier for the protein database; the second part is a protein accession.
Ok, noted. Given your answer, I looked for another option and found what I needed. I'll post it as an answer.