Entering edit mode
2.0 years ago
Francesco
▴
20
hi, i have to retrieve all reviewed proteins from a list of species names (in a csv file. 646 species) on uniprot. i tried to use uniprot api service and unipressed (python library). This is che script i wrote:
from unipressed import UniprotkbClient
import pandas as pd
data_df = pd.read_csv('organisms.csv', header=0, sep=',')
species_names = (data_df['Species'])
species_names = (species_names.dropna())
for record in UniprotkbClient.search(
query = {
'organism_name' : s for s in species_names
}).each_record():
display(record)
unfortunately, the for loop is unable to retrieve all proteins of all species (it downloads only proteins of the last specie). i tried to add an AND condition to download only reviewed proteins but i get an error message. please help me :')
Have you seen the examples that UniProt has for Python queries : https://www.uniprot.org/help/api_queries
If you are trying to download a large amount of data from public resources you will want to put some pauses between the species to prevent overloading the servers/getting your IP banned.
thank you!
I have a script here, which you can modify: Need help to retrive sequences
You just need to change the tax ID encoded in the script. Currently it is set to txid9606 (Homo sapiens)
Kevin