Hi everyone I am new to bioinformatics and I am working on my thesis project which requires me to download reference bacteria genome from ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt. I am super green in this field so I really don't know what I am doing. Here is the code that I was given to download the raw fastq files.
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt grep 'Complete Genome' assembly_summary.txt \ assembly_summary_complete_latest_reference_genomes.txt awk -F "\t" '$12=="Complete Genome" && $11=="latest"{print $20}' assembly_summary.txt \ assembly_summary_complete_latest_reference_genomes_paths.txt mkdir BacterialGenomes for i in $(cat assembly_summary_complete_latest_reference_genomes_paths.txt) do wget -P BacterialGenomes ${i}/*genomic.fna.gz done
When I run this script I get stuck in an infinite loop with the same error messages (posted below): I am using Linux with Ubuntu (just in case anyone is wondering).
Warning: wildcards not supported in HTTP. --2021-12-25 21:14:23-- https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/157/365/GCA_002157365.2_ASM215736v2/*genomic.fna.gz Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 165.112.9.230, 130.14.250.10, 2607:f220:41f:250::230, ... Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|165.112.9.230|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2021-12-25 21:14:23 ERROR 404: Not Found.
Thank you for any help you may be able to provide!
I also tried doing this in R using
but I get an error saying