Hi,
I want to download protein fasta files for a set of bird species. I have the genome assembly accessions in a file. I feel like every time I need to bulk download fasta files I've forgotten how I did it last time and the databases have all changed their websites/interfaces. I used the NCBI databases command line to download the files. However, datasets gives each accession its own folder containing "protein.faa". What I want is a single folder with fasta files so I can then use this in Orthofinder and other programmes. It's essentially useless to have a few hundred folders containing a file with the same name. Does anyone know the best way to download these files (and a way that will remain the best way and I can use it again in the future) or figure out how to use the downloads from datasets? Thank you.
Assuming you are on Linux, if all your downloaded directories are in
/home/emawhitt/fasdls
then simply run:mv /home/emawhitt/fasdls/*/*.faa /home/emawhitt/fasdls
This will move all the
.faa
files to/home/emawhitt/fasdls
.thank you. Is there some way to rename the files too? All the files are called protein.faa
You could look at:
https://stackoverflow.com/questions/16266930/how-to-rename-files-in-folders-to-foldername-using-batch-file
https://askubuntu.com/questions/746860/rename-a-file-to-parent-directorys-name-in-terminal
https://askubuntu.com/questions/759422/rename-files-adding-their-parent-folder-name
This would move and rename the files at once (might be a bit slow depending on how many files you have). Just replace
/home/emawhitt/fasdls
that is assigned toMYPATH
right now with whatever is the path to the directory containing all theprotein.faa
files (in their respective sub-directories).MirianT_NCBI 's solution down below might be a bit faster though.