Question

Download UniProt page source using python

0

Entering edit mode

10.3 years ago

dovah ▴ 40

Hi guys!
I'm trying to save the content a web page to file, using python (3.4). More specifically, my aim is to save the ID and the FT-lines content of uniprot pages for given proteins. I have a text file containing several url and I have to save every related web page.

All what I can to is an accession code. The print function only allows to displaythe webpage content in a terminal. If I try to write a file on the query, it doesn't give the expected output (just a series of “random” numbers and letters)

So, I wondered if anyone has tested something like this before and could help me with my issue.
Thanks in advance!

#requesting webpage
import urllib.request
url = 'http://www.uniprot.org/uniprot/APBB1_HUMAN.txt'
req = urllib.request.Request(url)
page = urllib.request.urlopen(req)
src = page.readall()

#display webpage content on terminal
print(src)

#writing to file
with open("query.txt", "w") as f:
    for x in src:
    f.write(str(x))

python uniprot • 5.2k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by dovah ▴ 40

Ram · Accepted Answer · 2014-07-23

2

Entering edit mode

10.3 years ago

Matt Shirley 10k

You might try something like this:

Note that you don't need to make a Request object, and that you can eliminate the "with" statements and do something like urls = open('url_file.txt'), but then you would need to explicitly close the filehandles.

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by Matt Shirley 10k