How Difficult/Reliable Is It To Programmatically (Python) Look Up And Download Papers?
2
7
Entering edit mode
14.7 years ago
Rvidal ▴ 270

I know that it is possible to write a script that attempts to use the ezproxy that most universities use to download papers directly using some search query. I have seen a perl implementation of this but was looking for something a bit cleaner and hopefully in python.

I don't mind having the script only be able to work within a university network, but it would have to be able to check if the paper is accessible via the current IP or such. Not sure how feasible this is, thus my question...

literature python text • 4.3k views
ADD COMMENT
6
Entering edit mode
14.7 years ago

Have you tried this py-ezproxy?

The script that you have seen in perl may have been written with the Mechanize library. In case, you can look at Mechanize in python, which is the reimplementation in python of the same concept. Anyway, you can use mechanize to connect to the internet using a proxy and do what you are asking for.

ADD COMMENT
1
Entering edit mode

Mechanize is a great tool. You could even go an use twill (a library built on Mechanize) to further simplify the usage that you are after: http://pypi.python.org/pypi/twill/0.9

ADD REPLY
0
Entering edit mode

I second twill. Much easier to use than Mechanize for most simple tasks.

ADD REPLY
3
Entering edit mode
14.7 years ago
Ian Simpson ▴ 960

We have certainly written scripts using Mechanize for this in the past which picked up ~85-90% of articles for which PDFs were available. This trawled around looking for links, forwards etc.. to PDFs.

So you could go that way, but I wonder if you might want to take a look at Pubget http://pubget.com/. I haven't had a close look, but they have an API that you might be able to use to do the hard work for you. As I say I don't know how good the return rate is with this.

ADD COMMENT
0
Entering edit mode

OK looks like it requires the object has a DOI, so you MAY have some issues with older articles.

ADD REPLY
0
Entering edit mode

This could be a limitation on the API as the web interface allows a reasonable proxy to a pubmed search although at present it doesn't support the [tags] search that I really like using for Pubmed searches (i.e. bloggs_j[1AU]). Still it looks pretty useful.

ADD REPLY
0
Entering edit mode

That API actually looks good for a web app I'm working on. Thanks! However, my initial question is more targeted at a python script that would retrieve a given list of papers if a list of say, PubMedIDs are provided. Will look into Mechanize and see how that goes.

ADD REPLY

Login before adding your answer.

Traffic: 2531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6