Search for last Nuccore Entries
1
0
Entering edit mode
9.1 years ago

Hello everyone,

I'm trying to find an elegant solution to retrieve all sequence from Nuccore (nucleotide NCBI) that have been added since a time-lapse (for example a week).

So far I found the genome report files, that contains a list of all genomes for a certain class of organism: ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/viruses.txt (possible to parse and see what is new...)

I found that efetch and esearch allowed to search in pubmed with some dates parameters, but date search are not allowed for nuccore.

That's all I've got.

Any good idea is welcome

Thanks for your help

Eutilities NCBI Nucleotides • 2.2k views
ADD COMMENT
2
Entering edit mode
9.1 years ago
5heikki 11k

With Entrez Direct, what has been published since October 2015.

esearch -db nuccore -query "("2015/10/01"[Publication Date] : "2015/11/09"[Publication Date])"
ADD COMMENT
0
Entering edit mode

Well done, piped with efetch it's perfect:

esearch -db nuccore -query "("2015/11/08"[Publication Date] : "2015/11/09"[Publication Date])" | efetch -format fasta

Many Thanks!

ADD REPLY
1
Entering edit mode

Unfortunately far from perfect. Efetch quite often fails with larger downloads and doesn't necessarily even spit out a warning or anything. I would download the GIs instead of fasta and then to begin with check that the number of downloaded GIs is the same than:

esearch -db nuccore -query "("2015/10/01"[Publication Date] : "2015/11/09"[Publication Date])" | xtract -element Count

Then I'd split the list of GIs with split to e.g. 500 lines per file and then loop over those..

for f in *.splitFile
do
    IDs=$(cat $f | tr "\n" "," | sed 's/,$//')
    epost -db nuccore -id $IDs | efetch -format fasta > $f.fna
done

In addition you need to build some kind of check for these batch downloads. E.g. the file should have as many headers as there were lines in the id file. All is great then as long as download didn't fail in the middle of the last sequence :)

ADD REPLY

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6