Unable to fetch human protein sequence through UniProt API
2
0
Entering edit mode
15 months ago
anasjamshed ▴ 140

I am using this Python script to fetch human protein sequences, but it's not fetching any sequence:

import requests
def fetch_protein_sequence_from_uniprot(protein_name):
    #Declaring UniProt API
    uniport_api_url = f"https://www.uniprot.org/uniprot/?query={protein_name}&format=fasta&organism:9606"
    response = requests.get(uniport_api_url)

    # Parse the response to extract sequence
    sequence = ""
    if response.ok:
        lines = response.text.split("\n")
        for line in lines:
            if not line.startswith(">"):  # Exclude header lines
                sequence += line
    return sequence

# Read protein names from file into a list
with open("prot.txt", "r") as file:
    protein_names = file.read().splitlines()
# Example usage:
protein_names = ["BRCA1", "TP53"]  # Replace with your list of protein names

for name in protein_names:
    sequence = fetch_protein_sequence_from_uniprot(name)
    print(f"Protein Name: {name}")
    print(f"Human Protein Sequence: {sequence}\n")

Is there any problem with the API URL?

UniPROT python fasta • 1.3k views
ADD COMMENT
0
Entering edit mode

what kind of uniprot id are you using ?

ADD REPLY
0
Entering edit mode

I am using just gene names to fetch the sequences

ADD REPLY
1
Entering edit mode

so it just won't work. Look at:

https://www.uniprot.org/uniprot/?query=KCNH2&format=fasta&organism:9606

read the API doc.

ADD REPLY
2
Entering edit mode
15 months ago
JC 13k

The API end-point you are using is not accepting gene symbols, only UniprotIDs, so you need to use the search API as:

$ curl -L 'https://rest.uniprot.org/uniprotkb/search?query=gene_exact:BRCA1+AND+organism_id:9606&format=fasta&size=1'
>sp|P38398|BRCA1_HUMAN Breast cancer type 1 susceptibility protein OS=Homo sapiens OX=9606 GN=BRCA1 PE=1 SV=2
MDLSALRVEEVQNVINAMQKILECPICLELIKEPVSTKCDHIFCKFCMLKLLNQKKGPSQ
CPLCKNDITKRSLQESTRFSQLVEELLKIICAFQLDTGLEYANSYNFAKKENNSPEHLKD
EVSIIQSMGYRNRAKRLLQSEPENPSLQETSLSVQLSNLGTVRTLRTKQRIQPQKTSVYI
ELGSDSSEDTVNKATYCSVGDQELLQITPQGTRDEISLDSAKKAACEFSETDVTNTEHHQ
PSNNDLNTTEKRAAERHPEKYQGSSVSNLHVEPCGTNTHASSLQHENSSLLLTKDRMNVE
KAEFCNKSKQPGLARSQHNRWAGSKETCNDRRTPSTEKKVDLNADPLCERKEWNKQKLPC
SENPRDTEDVPWITLNSSIQKVNEWFSRSDELLGSDDSHDGESESNAKVADVLDVLNEVD
EYSGSSEKIDLLASDPHEALICKSERVHSKSVESNIEDKIFGKTYRKKASLPNLSHVTEN
LIIGAFVTEPQIIQERPLTNKLKRKRRPTSGLHPEDFIKKADLAVQKTPEMINQGTNQTE
QNGQVMNITNSGHENKTKGDSIQNEKNPNPIESLEKESAFKTKAEPISSSISNMELELNI
HNSKAPKKNRLRRKSSTRHIHALELVVSRNLSPPNCTELQIDSCSSSEEIKKKKYNQMPV
RHSRNLQLMEGKEPATGAKKSNKPNEQTSKRHDSDTFPELKLTNAPGSFTKCSNTSELKE
FVNPSLPREEKEEKLETVKVSNNAEDPKDLMLSGERVLQTERSVESSSISLVPGTDYGTQ
ESISLLEVSTLGKAKTEPNKCVSQCAAFENPKGLIHGCSKDNRNDTEGFKYPLGHEVNHS
RETSIEMEESELDAQYLQNTFKVSKRQSFAPFSNPGNAEEECATFSAHSGSLKKQSPKVT
FECEQKEENQGKNESNIKPVQTVNITAGFPVVGQKDKPVDNAKCSIKGGSRFCLSSQFRG
NETGLITPNKHGLLQNPYRIPPLFPIKSFVKTKCKKNLLEENFEEHSMSPEREMGNENIP
STVSTISRNNIRENVFKEASSSNINEVGSSTNEVGSSINEIGSSDENIQAELGRNRGPKL
NAMLRLGVLQPEVYKQSLPGSNCKHPEIKKQEYEEVVQTVNTDFSPYLISDNLEQPMGSS
HASQVCSETPDDLLDDGEIKEDTSFAENDIKESSAVFSKSVQKGELSRSPSPFTHTHLAQ
GYRRGAKKLESSEENLSSEDEELPCFQHLLFGKVNNIPSQSTRHSTVATECLSKNTEENL
LSLKNSLNDCSNQVILAKASQEHHLSEETKCSASLFSSQCSELEDLTANTNTQDPFLIGS
SKQMRHQSESQGVGLSDKELVSDDEERGTGLEENNQEEQSMDSNLGEAASGCESETSVSE
DCSGLSSQSDILTTQQRDTMQHNLIKLQQEMAELEAVLEQHGSQPSNSYPSIISDSSALE
DLRNPEQSTSEKAVLTSQKSSEYPISQNPEGLSADKFEVSADSSTSKNKEPGVERSSPSK
CPSLDDRWYMHSCSGSLQNRNYPSQEELIKVVDVEEQQLEESGPHDLTETSYLPRQDLEG
TPYLESGISLFSDDPESDPSEDRAPESARVGNIPSSTSALKVPQLKVAESAQSPAAAHTT
DTAGYNAMEESVSREKPELTASTERVNKRMSMVVSGLTPEEFMLVYKFARKHHITLTNLI
TEETTHVVMKTDAEFVCERTLKYFLGIAGGKWVVSYFWVTQSIKERKMLNEHDFEVRGDV
VNGRNHQGPKRARESQDRKIFRGLEICCYGPFTNMPTDQLEWMVQLCGASVVKELSSFTL
GTGVHPIVVVQPDAWTEDNGFHAIGQMCEAPVVTREWVLDSVALYQCQELDTYLIPQIPH
SHY

Check https://www.uniprot.org/help/api_queries

ADD COMMENT
0
Entering edit mode

I just want to use a python script to fetch the sequences. I have 193000 protein interactions

ADD REPLY
0
Entering edit mode

And what's the problem adapting JC's solution to use python instead of shell? He's just used a different URL/query pattern. There's nothing to actually change code-wise.

ADD REPLY
0
Entering edit mode

Can we use curl in Python?

ADD REPLY
0
Entering edit mode

No but you can understand curl, then look into your code for something that performs an identical function, find the difference in what is being done and implement the change.

Just because the pasta shape changes does not mean you need to invent a whole new fork. And, the pasta shape has nothing to do with whether you use a fork or a spoon to eat. The URL has changed here, bash or python should not matter.

ADD REPLY
0
Entering edit mode

I liked the pasta analogy (and real pasta) ;)

ADD REPLY
1
Entering edit mode
15 months ago
jv ★ 1.8k

Note that UniProt did recently overhaul their REST API and that your URL should start with https://rest.uniprot.org as seen in the answer from JC . I'm not sure if changing the URL is enough to solve your issue but it's a start.

ADD COMMENT

Login before adding your answer.

Traffic: 1925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6