I want local copies because I am trying to implement parallel blast using Apache Spark. I am choosing this particular domain, bioinformatics, because I took an intro course. I apologize if this question comes across as a bit lazy but I thought there might be multiple options. I'm looking here: https://ftp.ncbi.nlm.nih.gov/ But can't figure out which one the protein database is.
My plan was to download Sars-covid protein sequences and run Blast.
Yes, I saw this too: https://ftp.ncbi.nlm.nih.gov/blast/db/
refseq_protein.00.tar.gz 2022-12-06 01:45 8.9G
refseq_protein.00.tar.gz.md5 2022-12-06 01:45 59
refseq_protein.01.tar.gz 2022-12-06 01:45 2.1G
refseq_protein.01.tar.gz.md5 2022-12-06 01:45 59
refseq_protein.02.tar.gz 2022-12-06 01:45 2.1G
refseq_protein.02.tar.gz.md5 2022-12-06 01:45 59
refseq_protein.03.tar.gz 2022-12-06 01:45 2.1G
refseq_protein.03.tar.gz.md5 2022-12-06 01:45 59
refseq_protein.04.tar.gz 2022-12-06 01:45 2.1G
refseq_protein.04.tar.gz.md5 2022-12-06 01:45 59
refseq_protein.05.tar.gz 2022-12-06 01:46 2.1G
refseq_protein.05.tar.gz.md5 2022-12-06 01:46 59
refseq_protein.06.tar.gz 2022-12-06 01:46 2.1G
refseq_protein.06.tar.gz.md5 2022-12-06 01:46 59
refseq_protein.07.tar.gz 2022-12-06 01:46 2.1G
refseq_protein.07.tar.gz.md5 2022-12-06 01:46 59
refseq_protein.08.tar.gz 2022-12-06 01:46 2.1G
refseq_protein.08.tar.gz.md5 2022-12-06 01:46 59
refseq_protein.09.tar.gz 2022-12-06 01:46 2.1G
refseq_protein.09.tar.gz.md5 2022-12-06 01:46 59
refseq_protein.10.tar.gz 2022-12-06 01:47 2.1G
refseq_protein.10.tar.gz.md5 2022-12-06 01:47 59
refseq_protein.11.tar.gz 2022-12-06 01:47 2.1G
refseq_protein.11.tar.gz.md5 2022-12-06 01:47 59
refseq_protein.12.tar.gz 2022-12-06 01:47 2.1G
refseq_protein.12.tar.gz.md5 2022-12-06 01:47 59
refseq_protein.13.tar.gz 2022-12-06 01:47 2.1G
refseq_protein.13.tar.gz.md5 2022-12-06 01:47 59
refseq_protein.14.tar.gz 2022-12-06 01:47 2.1G
refseq_protein.14.tar.gz.md5 2022-12-06 01:47 59
refseq_protein.15.tar.gz 2022-12-06 01:47 2.1G
refseq_protein.15.tar.gz.md5 2022-12-06 01:47 59
refseq_protein.16.tar.gz 2022-12-06 01:48 2.1G
refseq_protein.16.tar.gz.md5 2022-12-06 01:48 59
refseq_protein.17.tar.gz 2022-12-06 01:48 2.1G
refseq_protein.17.tar.gz.md5 2022-12-06 01:48 59
refseq_protein.18.tar.gz 2022-12-06 01:48 2.1G
refseq_protein.18.tar.gz.md5 2022-12-06 01:48 59
refseq_protein.19.tar.gz 2022-12-06 01:48 2.1G
refseq_protein.19.tar.gz.md5 2022-12-06 01:48 59
refseq_protein.20.tar.gz 2022-12-06 01:48 2.1G
refseq_protein.20.tar.gz.md5 2022-12-06 01:48 59
refseq_protein.21.tar.gz 2022-12-06 01:48 2.1G
refseq_protein.21.tar.gz.md5 2022-12-06 01:48 59
refseq_protein.22.tar.gz 2022-12-06 01:49 2.1G
refseq_protein.22.tar.gz.md5 2022-12-06 01:49 59
refseq_protein.23.tar.gz 2022-12-06 01:49 2.1G
refseq_protein.23.tar.gz.md5 2022-12-06 01:49 59
refseq_protein.24.tar.gz 2022-12-06 01:49 2.1G
refseq_protein.24.tar.gz.md5 2022-12-06 01:49 59
refseq_protein.25.tar.gz 2022-12-06 01:49 2.1G
refseq_protein.25.tar.gz.md5 2022-12-06 01:49 59
refseq_protein.26.tar.gz 2022-12-06 01:49 2.1G
refseq_protein.26.tar.gz.md5 2022-12-06 01:49 59
refseq_protein.27.tar.gz 2022-12-06 01:50 2.1G
refseq_protein.27.tar.gz.md5 2022-12-06 01:50 59
refseq_protein.28.tar.gz 2022-12-06 01:50 2.1G
refseq_protein.28.tar.gz.md5 2022-12-06 01:50 59
refseq_protein.29.tar.gz 2022-12-06 01:50 2.1G
refseq_protein.29.tar.gz.md5 2022-12-06 01:50 59
refseq_protein.30.tar.gz 2022-12-06 01:50 1.5G
refseq_protein.30.tar.gz.md5 2022-12-06 01:50 59
That is quite big if I have to download every number for the full set.
Thanks.
You can either make your own database or if you want a preformatted one then
pataa.tar.gz
may be on the smaller end.