Question

Download rRNAs.fasta for all bacteria from database

0

Entering edit mode

2.0 years ago

kamanovae ▴ 100

Hi!

I want to download rRNA sequences for all bacteria from a database http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-gut/v2.0.1/. I know that I can recursively make it using get, but I don't understand how exactly to do it. I need to go into a folder, recursively go through the subfolders.

enter image description here

Inside folder1 there are many subfolders, which contain two folders (one of which is "genome"). There are many files in the folder "genome", but I only need FileName.rRNAs.fasta. Sometimes it may not be there.

An example download path might look like this:

species_catalogue/MGYG0000000/MGYG000000001/genome/MGYG000000001_rRNAs.fasta

Maybe you know what wget command I need?

I would be grateful for any help!

wget NCBI • 977 views

ADD COMMENT • link updated 2.0 years ago by Ram 45k • written 2.0 years ago by kamanovae ▴ 100

1

Entering edit mode

That web/ftp site uses a robots file that prevents crawling of the site. As a result you would not be able to use wget to crawl. One is supposed to respect this setting.

That said the folders have a specific URL structure that should allow for creation of direct URL's and a loop to download the fasta files you are looking for. I have confirmed that it works. In any case you should be a good citizen and put in appropriate pauses if you choose to download with this kind of method.

ADD REPLY • link 2.0 years ago by GenoMax 152k

1

Entering edit mode

Use the RESTAPI via the python toolkit to download the files: https://pypi.org/project/mg-toolkit/

ADD REPLY • link 2.0 years ago by Mark ★ 1.7k