I am looking to download a specific bunch of full genomes from NCBI https://www.ncbi.nlm.nih.gov/assembly/
I typed my desired organism and then from the option 'Download assemblies," I was able to download all the available Genebanks files. My issue here is that those files containing accession number or project number as file name, so I am asking if it is possible to download the same files but instead of accession numbers/project numbers having the organism name as file name?
Ok, I see. Thanks for your response. I was looking for something before downloading my files but Its ok. Regarding your posted links, are useful for getting the idea. I think if I edit the commands for the fasta file, I can apply to the Genebanks' one. ;-)
You could script initial downloads such that files are renamed right after they are downloaded but if you already have the files in hand then solutions above would avoid having to re-download the data.
` for f in *.gbff; do d="$(grep ORGANISM "$f"| awk '{first=$1; $1=""; print $0}'|sed 's/^ *//; s/ /_/g').gbff"; if [ ! -f "$d" ]; then mv "$f" "$d" ; elseecho "File '$d' already exists! Skiped '$f'"; fi $i.gb; done`
* Explanation:
Usually, the Genebank files contain in the field of ORGANISM 2-3 names (depending on the nomenclature). The above code works when we have 3 names and we don't mind about having spaces in the file name.
For example, let's say we have Homo Homo sapiens genebank file with the first code we will obtain Homo Homo Sapiens.gb.
The issue with the first code is that by using grep we obtain ORGANISM Homo Homo sapiens which contains the word ORGANISM and SOME SPACES. (I tried the option -oP for hiding the word ORGANISM but it didn't work for me, so I used the $2 to print the first word after the word ORGANISM).
As you know the answer for your specific question is no.
Since you have already downloaded the files with accession number names, it should be easy to rename them after the fact.
Possible solutions:
https://stackoverflow.com/questions/54078687/automatically-rename-fasta-files-with-the-id-of-the-first-sequence-in-each-file
https://stackoverflow.com/questions/53094543/rename-genome-fasta-files-with-part-of-sequence-header
Ok, I see. Thanks for your response. I was looking for something before downloading my files but Its ok. Regarding your posted links, are useful for getting the idea. I think if I edit the commands for the fasta file, I can apply to the Genebanks' one. ;-)
You could script initial downloads such that files are renamed right after they are downloaded but if you already have the files in hand then solutions above would avoid having to re-download the data.