I want to do a local blast using all the bacterial protein data from NCBI instead of NR. Is there any way to download all the data from NCBI? Or to filter the NR database locally?
I want to do a local blast using all the bacterial protein data from NCBI instead of NR. Is there any way to download all the data from NCBI? Or to filter the NR database locally?
Hello,
if you just want to Download all the bacterial proteins from NCBI you could use Entrez Direct: Click Here Here is some small example which could do the job (restricted to refseq)
esearch -db protein -query "bacteria [ORGN] AND refseq [filter]" | efetch -format fasta > bacterial_proteins
See this post and my answer inside:
where can I get environmental bacteria genome in fasta format (as many as possible)?
NCBI structure has been changed, so I am not sure about recent archives.
But to find at least something see ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/
and all.faa.tar.gz file. I don't know where to find a current version. The old one is from 02.06.2015..
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
thanks a lot for your help.