Hello guys, I need to build the viral db using kraken2. It succeeded on building the SILVA db, but I have troubles with the viral db.
Here are the command line / errors I got:
storage/Kraken2/kraken2-build --download-library viral --threads 4 --db /storage/KrakenViral
rsync_from_ncbi.pl: unexpected FTP path (new server?) for https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/839/185/GCF_000839185.1_ViralProj14174
kraken2-build --use-ftp --download-library viral --db /storage/KrakenViral/
Error downloading assembly summary file for viral, exiting.
kraken-build --download-library viral --db /storage/KrakenDBViral/ --use-wget
Error downloading assembly summary file for viral, exiting.
I already applied the changes to rsync_from_ncbi.pl as described in https://githubmemory.com/repo/DerrickWood/kraken2/issues/465 but never changed.
Please, can someone help me?
Thank you very much.
Emilio
NCBI link works and has files in that directory. So this could be an issue on your local end with firewall.
Hello, I saw...it was my first thinking, but it's not true. I already tried to connect to the link from my local network (same host having the problem with kraken), and it worked:
emilio@bilbo-06:~$ wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/839/185/GCF_000839185.1_ViralProj14174 --2021-11-22 10:18:48-- https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/839/185/GCF_000839185.1_ViralProj14174 Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 2607:f220:41e:250::13, 2607:f220:41e:250::11, 130.14.250.7, ... Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|2607:f220:41e:250::13|:443... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/839/185/GCF_000839185.1_ViralProj14174/ [following] --2021-11-22 10:18:51-- https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/839/185/GCF_000839185.1_ViralProj14174/ Reusing existing connection to [ftp.ncbi.nlm.nih.gov]:443. HTTP request sent, awaiting response... 200 OK Length: 2657 (2.6K) [text/html] Saving to: ‘GCF_000839185.1_ViralProj14174’
GCF_000839185.1_ViralProj14174 100%[=====================================================================>] 2.59K --.-KB/s in 0s
2021-11-22 10:18:51 (65.1 MB/s) - ‘GCF_000839185.1_ViralProj14174’ saved [2657/2657]
Then you just need to make sure you use the correct/complete script that you linked.
Hello. At the end, I have fixed the rsync_from_ncbi.pl script that worked successfully.
emilio@bilbo-06:/storage/Kraken2$ /storage/Kraken2/kraken2-build --download-library viral --threads 4 --db /storage/KrakenViral Rsync dry run complete, removing any non-existent files from manifest. Step 1/2: Performing rsync file transfer of requested files Rsync file transfer complete. Step 2/2: Assigning taxonomic IDs to sequences Processed 11808 projects (14719 sequences, 463.50 Mbp)... done. All files processed, cleaning up extra sequence files... done, library complete. Masking low-complexity regions of downloaded library... done.
Here is the final version of rsync_from_ncbi.pl, hoping could be helpful to someone: