What would be the best way (if there is any) to get the accession number of a genome assembly that contains a given protein accession number?
E.g. if i just have the results of a blastp run of a given query against the ncbi nr database and take the accession numbers of the subjects, how do i best find out if a corresponding genome assembly exists that contains these subject proteins, and what the genome assembly accession number would be?
I am not simply looking to find similar proteins in the genome databases (so no blastx against wgs for example), but to find out which exact genome was the source of which exact protein accession.
I was originally hoping to be able to simply parse that info from the genbank-entry of the subject protein in question, but it turns out the protein genbanks do not necessarily contain that info... Is there perhaps a lookup table linking protein accessions to genome assemblies?
I am looking for a automizable solution (so command-line based).
Can anybody help with that?
This worked for me, using the Entrez Direct tools: