Hi,
I find this tool https://github.com/kblin/ncbi-genome-download in github for downloading the bacteria genomes.
ncbi-genome-download bacteria
it downloades ---> refseq/bacteria/GCF_940077525.1
from GCF_ * how do I get fasta files ?
Then I tried this, cmd, it unables to download the fasta files.
ncbi-genome-download --formats fasta bacteria --parallel 16
WARNING: Skipping entry, as it has no ftp directory listed: 'GCF_023646435.1'
Inaddition, I tried to download sequences directly from ncbi ftp site.
https://ftp.ncbi.nlm.nih.gov/refseq/release/README
wget ftp://ftp.ncbi.nih.gov/refseq/release/bacteria/
generates ----> index.html file
The html file contains all the ftp links related to the bacterial genomes.
How do I download the sequences using the index.html file ? Is their any easy way to download the bacterial, viral genomes ?
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head>
<title>Index of /refseq/release/bacteria on ftp.ncbi.nih.gov:21</title>
</head>
<body>
<h1>Index of /refseq/release/bacteria on ftp.ncbi.nih.gov:21</h1>
<hr>
<pre>
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.1.1.genomic.fna.gz">bacteria.1.1.genomic.fna.gz</a> (112297068 bytes)
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.1.genomic.gbff.gz">bacteria.1.genomic.gbff.gz</a> (90133141 bytes)
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.10.1.genomic.fna.gz">bacteria.10.1.genomic.fna.gz</a> (2647137 bytes)
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.10.genomic.gbff.gz">bacteria.10.genomic.gbff.gz</a> (2242108 bytes)
2022 May 05 22:45 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.100.1.genomic.fna.gz">bacteria.100.1.genomic.fna.gz</a> (94215320 bytes)
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.100.genomic.gbff.gz">bacteria.100.genomic.gbff.gz</a> (82718913 bytes)
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.1000.1.genomic.fna.gz">bacteria.1000.1.genomic.fna.gz</a> (120745492 bytes)
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.1000.genomic.gbff.gz">bacteria.1000.genomic.gbff.gz</a> (104026228 bytes)
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.1001.1.genomic.fna.gz">bacteria.1001.1.genomic.fna.gz</a> (120374412 bytes)
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.1001.genomic.gbff.gz">bacteria.1001.genomic.gbff.gz</a> (102935563 bytes)
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.1002.1.genomic.fna.gz">bacteria.1002.1.genomic.fna.gz</a> (116037436 bytes)
2022 May 05 22:46 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.1002.genomic.gbff.gz">bacteria.1002.genomic.gbff.gz</a> (100084701 bytes)
2022 May 05 22:45 File <a href="ftp://ftp.ncbi.nih.gov:21/refseq/release/bacteria/bacteria.1003.1.genomic.fna.gz">bacteria.1003.
The ftp file you pointing out to a single bacteria genome is it right ?
Can't we get access to a ftp file in .gz with all bacteria and viral genomes.
I'm unable to download such data directly to my server. after few minutes, the programs stops. With out showing error message.
Of course it is possible - that is what the tool is meant for. I just did an incremental update of my archaeal database.
genome_updater.sh works well @ Mensur, good post.
No you can't.
This is where
datasets
tool from NCBI can come in handy. Here an example of all viral genomes: https://www.ncbi.nlm.nih.gov/datasets/genomes/?taxon=10239 <-- This example is only for view. You will need to use command linedatasets
tools to do the actual download since web tool is limited to 1000 genomes.