I have a bench of Uniprot proteome ID I would like to download in an automated way. e.g.
UP000005640
UP000001073
UP000001519
It sounds an easy task but after many tries I'm still unsuccessful, anyone has trick?
- I tried using the perl approach they describe here without success: https://www.uniprot.org/help/api_downloading
I tried using a for loop around
wget
command but I need a wild card or regex but I don't succeed to make it work: wget --regex-type .fasta.gzhttps://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Eukaryota/UP000000226/UP000000226_*.fasta.gz
I tried via the website but I do not get the fasta sequences, only some general description
Great, so my problem was to use
https://www.uniprot.org/proteomes
instead ofhttps://www.uniprot.org/uniprotkb
that do not deliver the same data. Sounds weird to me to not be able to download fasta proteome from the proteomes side of uniprot!I have finally used API URL with the snippet they provide: https://www.uniprot.org/help/api_queries (2.2 Large number of results: use pagination).
You can download the members of a proteome from its proteome page: In https://www.uniprot.org/proteomes/UP000005640 , you can click on the number after "Protein count" and download from there, or just at the top of the component table, click directly on "Download".
For programmatic access, there are some example scripts in https://www.uniprot.org/help/api_downloading , e.g. "Download the UniProt reference proteomes for all organisms below a given taxonomy node in compressed FASTA format"