Download UniProt page source using python
1
0
Entering edit mode
10.3 years ago
dovah ▴ 40

Hi guys!
I'm trying to save the content a web page to file, using python (3.4). More specifically, my aim is to save the ID and the FT-lines content of uniprot pages for given proteins. I have a text file containing several url and I have to save every related web page.

All what I can to is an accession code. The print function only allows to displaythe webpage content in a terminal. If I try to write a file on the query, it doesn't give the expected output (just a series of “random” numbers and letters)

So, I wondered if anyone has tested something like this before and could help me with my issue.
Thanks in advance!

#requesting webpage
import urllib.request
url = 'http://www.uniprot.org/uniprot/APBB1_HUMAN.txt'
req = urllib.request.Request(url)
page = urllib.request.urlopen(req)
src = page.readall()

#display webpage content on terminal
print(src)

#writing to file
with open("query.txt", "w") as f:
    for x in src:
    f.write(str(x))
python uniprot • 5.2k views
ADD COMMENT
2
Entering edit mode
10.3 years ago

You might try something like this:

Note that you don't need to make a Request object, and that you can eliminate the "with" statements and do something like urls = open('url_file.txt'), but then you would need to explicitly close the filehandles.

ADD COMMENT

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6