Question

Taverna Workflow To Retrieve 1500 Papers For One Or Two Keywords As Plain Text?

6

Entering edit mode

14.3 years ago

Egon Willighagen 5.4k

What Taverna 2 workflow can I use to do a query against the Open Access subsection of PubMed and return a subset of up to 1500 papers, for further processing, including text mining? If no complete workflow is available, I am interested in workflows that do similar things, particularly if they are hosted on MyExperiment.org. I am happy if it involved new plugins. I'm also happy with solutions that include the use of XSLT and BeanShell scripts.

Update: the bounty goes to the most functional (open source) Taverna BeanShell script.

literature pubmed text • 8.8k views

ADD COMMENT • link updated 14.1 years ago by Andrea_Bio ★ 2.9k • written 14.3 years ago by Egon Willighagen 5.4k

0

Entering edit mode

I do not know a Taverna workflow that does this already, but you can easily retrieve an XML-document with PMC-open identifiers via:

http://www.pubmedcentral.nih.gov/oai/oai.cgi?verb=ListIdentifiers&metadataPrefix=pmc&set=pmc-open

From that XML-document you can then get the individual records via further queries, such as: http://www.pubmedcentral.nih.gov/oai/oai.cgi?verb=GetRecord&identifier=oai:pubmedcentral.nih.gov:17827&metadataPrefix=pmc

More about PMC-OAI at: http://www.ncbi.nlm.nih.gov/pmc/about/oai.html

Hope this helps a bit, even though I cannot answer your original question.

ADD REPLY • link updated 5.4 years ago by Ram 44k • written 14.3 years ago by Joachim ★ 2.9k

0

Entering edit mode

Would you be happy to use UkPubmed instead of pubmed? http://ukpmc.ac.uk/ Ukpubmed indexes only open-access documents and compared to pubmed, it indexes PhD thesis published in the UK. not sure if I have time to prepare a taverna workflow for this, but if you look at the page it should not be difficult.

ADD REPLY • link 14.1 years ago by Giovanni M Dall'Olio 28k

0

Entering edit mode

Absolutely, either is fine!

ADD REPLY • link 14.1 years ago by Egon Willighagen 5.4k

score 2 · Answer 1 · 2010-12-16

2

Entering edit mode

14.1 years ago

Stain ▴ 20

I assume you've checked http://www.myexperiment.org/search?query=pubmed&type=all already..?

ADD COMMENT • link 14.1 years ago by Stain ▴ 20

Ram · Answer 2 · 2010-12-16

I am afraid it is very difficult to do that.

One option is to login into Pubmed Central, do a query, and then retrieve all the pdfs. I am sure you can automatize it with taverna, but it has been too long since I have used it and I am not able to program it.

Another option is to use the UKPubmed archive, making a query and then download all the pdfs by constructing a query like:

http://ukpmc.ac.uk/articles/<_insert_Pubmed_Central_ID_>?pdf=render

It seems that UKPubmed does not have APIs like Entrez. I don't know if you can get the Pubmed Central id from the Entrez Eutils, which are already implemented in taverna, but if you could then it would be easier.

In any case, with all these methods you will only be able to download only the open access articles.

score 1 · Answer 3 · 2010-12-23

1

Entering edit mode

14.1 years ago

Andrea_Bio ★ 2.9k

any use? if not exactly what you need you could modify the top 2.

http://www.myexperiment.org/packs/163.html

ADD COMMENT • link 14.1 years ago by Andrea_Bio ★ 2.9k