Hi, I have a list of ~2000 taxids and would like to retrieve all available nucleotide sequences of each taxon to build a reference database. With batch entrez I only get an error, even when using only a single taxid or accession number (.txt or .xml). ["An illegal character in a token. Possible wrong file format. Request processing canceled."] Also doesn't work with this perl script -
perl -e 'use LWP::Simple;getstore("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&rettype=fasta&retmode=text&id=".join(",",qw(410645, 410645, ...)),"seqs.fasta");'
with "..." being the list of taxids; but also with only a few IDs it retrieves much fewer sequences than what is available on genbank. Not sure whats wrong there.
Can anyone advice how to compile those (with little to no coding skills...)? Thanks!
Two options, none completely trival: either use command-line e-utils in a shell script and loop over all taxids read from a file, or download the whole NT database which you might already have and the NCBI taxonomy and create accession-lists for each taxid to add to pass to BLAST.