Hello everyone,
I'm trying to find an elegant solution to retrieve all sequence from Nuccore (nucleotide NCBI) that have been added since a time-lapse (for example a week).
So far I found the genome report files, that contains a list of all genomes for a certain class of organism: ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/viruses.txt (possible to parse and see what is new...)
I found that efetch and esearch allowed to search in pubmed with some dates parameters, but date search are not allowed for nuccore.
That's all I've got.
Any good idea is welcome
Thanks for your help
Well done, piped with efetch it's perfect:
Many Thanks!
Unfortunately far from perfect. Efetch quite often fails with larger downloads and doesn't necessarily even spit out a warning or anything. I would download the GIs instead of fasta and then to begin with check that the number of downloaded GIs is the same than:
Then I'd split the list of GIs with split to e.g. 500 lines per file and then loop over those..
In addition you need to build some kind of check for these batch downloads. E.g. the file should have as many headers as there were lines in the id file. All is great then as long as download didn't fail in the middle of the last sequence :)