So here's the situation. I have a spreadsheet of a few genome assemblies I need to pull form NCBI. I have the accession numbers for them, like "GCF_003031525" in a row (said accession number leads to https://www.ncbi.nlm.nih.gov/assembly/GCF_003031525.1/)
And I just need to download a bunch of assemblies (a few dozen) where I change the assembly variable, and I can get it all on my drive.
I hear BioPython can access NCBI and do this, I was kind of wondering how to prime this or if anyone has already done something this automated for a list of assemblies they have.
It would look like this with entrez direct:
If you specifically want to incorporate this in to a (Bio)Python script, Biopython has a submodule for
Entrez
. The syntax is very similar.http://biopython.org/DIST/docs/tutorial/Tutorial.html#sec139