I want to make an blastdb alias with blastdb_aliastool for Homo sapiens. I can go online to search and export a GI list and use it with the aliastool. However, I will have to repeat the process again after some time. Is there away I can fetch GI numbers from a command line or via Python/Biopython, R, and if not avoidable, Perl.
Update
After reading elsewhere on biostars and following the instruction Here to install edirect command line apps, I did this: epost -db taxonomy -id 9606|elink -batch -target nuccore |efetch -format uid >hsapiens.gi.txt
The output hsapiens.gi.txt contains only 100000 records, this is the maximum retrieval for one batch. Searching further about E-Utilities, I think I have to use `retstart` paramenter somewhere in the command line above. But how?
Im sure it's possible to bruteforce/scrape your way to all the GIs, or programatically via biomart maybe, but my experience of doing this against the NCBI's websites is to not bother, because whatever you write to do the scraping now will not work in 6 months. You are better off just writing a nice e-mail to the NCBI and asking for the data. They will reply before lunch time, maybe sooner if its all just sitting in an Excel file anyway from the last person that asked, and they may even tell you about API/SQL access you could use to keep your lists as up-to-date as possible. This goes for pretty much anything where you want a large amount of data once, then not much ever again.
Thanks for the suggestion. I will contact NCBI if the question can't be answered soon here. I chose to post here because the question may benefit others as well.