Entering edit mode
3.9 years ago
DdogBoss
▴
20
I am trying to obtain full protein sequences by protein accession number.
I have the protein accession numbers in csv files, but am struggling on how to set up the URL.
Problem is that I also search the printed in url in web browser, and I land on blast webpage.
Sample query: query = "LHTPMY", dataset = "nr", and service = "blastp"
One example protein accession number could be : XP_023329844.1
What I have so far is :
import requests
def NCBI(query, dataset, service):
request_base = "https://blast.ncbi.nlm.nih.gov/"
request_base2 = "Blast.cgi?QUERY="
query = str(query)
database = "&DATABASE= %s" %dataset
service = "&PROGRAM = %s" %service
NCBI_GI = "&NCBI_GI = T"
my_request = request_base + request_base2 + query + database + service + NCBI_GI
print(my_request)
r = requests.get(url=my_request)
json_str = r.json()
return json_str
Documentation here: https://ncbi.github.io/blast-cloud/dev/api.html
Resources and advice are welcome.
Thank you in advance.
Can you not just use the existing Biopython implementation of
NCBIWWW
etc?