I'm using both genbank files (downloaded from NCBI ftp) and the corresponding .ptt files (available as all.pt.tar.gz here).
However, while each gene is identified with a GI in both of the files, they do not match. As one example, here's the genbank and ptt files of the same Acaryochloris marina chromosome. The AM10003 locus is identified with 158333234 in the ptt file, and 158303475 in the genbank file. (Searching for AM10003 in NCBI gets me to both of those IDs as well).
How can I convert from the genbank ID to the ptt ID, or vice versa? (I have to do this for an extremely large number of genes, so querying NCBI's site for each one's locus tag isn't a practical option).
Excellent- thanks. Just one more question- do you know if there is a file containing all PTTs for my version of the genbank files, analogous to ftp://ftp.ncbi.nih.gov/genomes/Bacteria/all.ptt.tar.gz? Or should I download them individually?
You have to download them individually. You can simply download all the files by linux command - wget ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Acaryochloris_marina_MBIC11017_uid12997/*.ptt
I actually tried that command, but NCBI appears to throttle such requests so that it can never get more than a few files... I'll try other approaches. Thanks!