I am using this Python script to fetch human protein sequences, but it's not fetching any sequence:
import requests
deffetch_protein_sequence_from_uniprot(protein_name):#Declaring UniProt API
uniport_api_url =f"https://www.uniprot.org/uniprot/?query={protein_name}&format=fasta&organism:9606"
response = requests.get(uniport_api_url)# Parse the response to extract sequence
sequence =""if response.ok:
lines = response.text.split("\n")for line in lines:ifnot line.startswith(">"):# Exclude header lines
sequence += line
return sequence
# Read protein names from file into a listwithopen("prot.txt","r")asfile:
protein_names =file.read().splitlines()# Example usage:
protein_names =["BRCA1","TP53"]# Replace with your list of protein namesfor name in protein_names:
sequence = fetch_protein_sequence_from_uniprot(name)print(f"Protein Name: {name}")print(f"Human Protein Sequence: {sequence}\n")
And what's the problem adapting JC's solution to use python instead of shell? He's just used a different URL/query pattern. There's nothing to actually change code-wise.
No but you can understand curl, then look into your code for something that performs an identical function, find the difference in what is being done and implement the change.
Just because the pasta shape changes does not mean you need to invent a whole new fork. And, the pasta shape has nothing to do with whether you use a fork or a spoon to eat. The URL has changed here, bash or python should not matter.
Note that UniProt did recently overhaul their REST API and that your URL should start with https://rest.uniprot.org as seen in the answer from JC . I'm not sure if changing the URL is enough to solve your issue but it's a start.
ADD COMMENT
• link
updated 15 months ago by
Ram
45k
•
written 15 months ago by
jv
★
1.8k
what kind of uniprot id are you using ?
I am using just gene names to fetch the sequences
so it just won't work. Look at:
https://www.uniprot.org/uniprot/?query=KCNH2&format=fasta&organism:9606
read the API doc.
Get all human proteins as a single file: https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Eukaryota/UP000005640/UP000005640_9606.fasta.gz