Hello, biostars,
I want to download all the accession numbers of the bacteria proteins
From https://www.ncbi.nlm.nih.gov/protein/?term=Bacteria
-->send to
--> file
--> Format
(Accession List) and create file
seems to not working for bacteria ( I tested with viruses, archaea and works perfectly)
After that, I tried to extract all accession numbers list via the command prompt, but I could not do so.
Even ncbi proposed command for the genomes doesn't seem to work "https://www.ncbi.nlm.nih.gov/protein/?term=Bacteria
" option command-line tool
which gives
datasets download genome taxon 2 --filename bacteria.zip
I got this error unknown flag: --filename
I also tried to "change" some commands such as genome
to genes
like ... datasets download gene taxon 2 --filename bacteria.zip,
but it downloads the gene with id 2 (parses the term taxon)
and I also tried curl 'ftp://ftp.ncbi.nlm.nih.gov/protein/?term=bacteria%5BAll+Fields%5D
Does anybody have an idea how to manipulate this issue?
A related Python script that you could use (search by FASTA title): How to download all sequences of a list of proteins for a particular organism
Thanks for the response. I will use the script if I ll need to download the respective seqs. Again thanks a lot for the script :D
AFAIK
datasets
is only meant to work with genome level data. Doingwill get you information about bacterial genome accessions. You can use
Thanks again for the response. I knew that It was based on the genome level, but I saw an option of gene, so my point was to download all the genes and afterwards to extract the ACC numbers... I know that my point was a bit stupid and complicated :P