Hi!
I want to download rRNA sequences for all bacteria from a database http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-gut/v2.0.1/. I know that I can recursively make it using get, but I don't understand how exactly to do it. I need to go into a folder, recursively go through the subfolders.
Inside folder1 there are many subfolders, which contain two folders (one of which is "genome"). There are many files in the folder "genome", but I only need FileName.rRNAs.fasta. Sometimes it may not be there.
An example download path might look like this:
species_catalogue/MGYG0000000/MGYG000000001/genome/MGYG000000001_rRNAs.fasta
Maybe you know what wget command I need?
I would be grateful for any help!
That web/ftp site uses a robots file that prevents crawling of the site. As a result you would not be able to use
wget
to crawl. One is supposed to respect this setting.That said the folders have a specific URL structure that should allow for creation of direct URL's and a loop to download the fasta files you are looking for. I have confirmed that it works. In any case you should be a good citizen and put in appropriate pauses if you choose to download with this kind of method.
Use the RESTAPI via the python toolkit to download the files: https://pypi.org/project/mg-toolkit/