Entering edit mode
6.3 years ago
sharmatina189059
▴
110
How can I download all protein sequences of complete genome sequences of Acinetobacter baumannii from ncbi ftp site?
OP needs to get
*protein.faa.gz
files since protein data is needed.OP take a look at the help for
ncbi-genome-download
. Give the option--format protein-fasta
to get what you want.(or download the genome or CDS data and tranform it yourself)
I am running this command
ncbi-genome-download -l complete,chromosome bacteria --genus "Acinetobacter baumannii" --format protein-fasta
but this gives me MD5SUMS file names like this. I need fasta sequnces.
should be included in *.faa
The MD5sums are always provided. They correspond to the files you need which should be present in a folder named
GCF_000...
.You command is also wrong.
complete,chromosome
is not one argument to the--assembly-level
option. You should specify one or the other. Similarly,bacteria
is also a positional argument and should come last in the command.Make sure you read the documentation on the github page.
Try:
or
I got:
You may still end up with some empty folders, so you'll need to pull out all the fasta files seperately after with something like
find ./ -name "*.faa.gz"
Hi Joe hope you are save and well.
Why this works:
And this script dont?
My list of genera(example):
I used to download all genomic fasta and works just fine! Thanks
Im not at a computer to test this at the moment, but my guess would be that your loop isn't synthesising the command properly. It may be the quotes around cds-fasta. Check your command is well formed and introduce the flags one by one in the loop to narrow down the issue.
Yeah that worked. Thanks man. Paulo