Question

Mining Papers On A Desired Topic Based On Certain Criteria

5

Entering edit mode

13.6 years ago

Arun 2.4k

I would like to obtain mainly two things. Suppose that the tag (or topic) I'd like to mine is circadian clock

1) I'd like to find out all keywords (where ever possible.. I guess relatively recent papers alone have keywords?) ( or any other equivalent main words) that are associated with this topic. I am interested in creating a tag cloud or word cloud. I think its a cool opening slide in a presentation. What do you guys think?

2) I'd like to mine for pioneer (major findings / breakthroughs) papers in this field (probably more restriction criterions should apply here, such as, humans or plants etc..), if not, most cited papers, from all available papers. This is basically for literature reading. Basically, how does one get across in finding papers that one should definitely have read??

I know how to generate a word cloud in R. I'd like to know if its possible to extract these information somehow from pubmed (using the tm R-package possibly?).

Thank you in advance for your suggestions, Best, Arun.

data text r pubmed • 3.9k views

ADD COMMENT • link updated 13.6 years ago by Malachi Griffith 20k • written 13.6 years ago by Arun 2.4k

score 4 · Answer 1 · 2011-12-17

An easy solution would be to search scholar.google.com for "circadian clock" and capture the top X results (X is a number of your choosing). Then, sort these by the "cited N times" field to find papers such as:

[?]Role of the CLOCK protein in the mammalian circadian mechanism[?][?] N Gekakis, D Staknis, HB Nguyen, FC Davis… - Science, 1998 - sciencemag.org[?] [?]Abstract[?] The mouse Clock gene encodes a bHLH-PAS protein that regulates circadian rhythms and is related to transcription factors that act as heterodimers. Potential partners of CLOCK were isolated in a two-hybrid screen, and one, BMAL1, was coexpressed with ...[?] Cited by 915

You could even run a metric like 915 citations / 13 yrs in print = 70 citations/yr.

Ram · Answer 2 · 2011-12-17

2

Entering edit mode

13.6 years ago

Zev.Kronenberg 12k

check out F1000

They usually have great insight. In fact just browsing the website would be sufficient.

ADD COMMENT • link 13.6 years ago by Zev.Kronenberg 12k

0

Entering edit mode

The two studies to date on the topic actually show that F1000 reviews miss the majority of highly cited papers: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0005910 http://library.queensu.ca/ojs/index.php/IEE/article/view/2379/2478

ADD REPLY • link updated 5.8 years ago by Ram 45k • written 13.6 years ago by Casey Bergman 18k

0

Entering edit mode

Thanks Zev. It seems to be paid and the studies above doesn't help that point either.

ADD REPLY • link 13.6 years ago by Arun 2.4k

score 2 · Answer 3 · 2011-12-17

2

Entering edit mode

13.6 years ago

Alex Paciorkowski 3.5k

A resource like Arrowsmith has sometimes been helpful for broadening/deepening literature mining expeditions. I often find a lot of interesting connections that are not picked up by PubMed alone. I'm not sure they can give you the "most cited" data, but you could send the list through Google Scholar to get that metric, as Larry suggests above.

ADD COMMENT • link 13.6 years ago by Alex Paciorkowski 3.5k

0

Entering edit mode

thanks for this link Alex.

ADD REPLY • link 13.6 years ago by Arun 2.4k

score 2 · Answer 4 · 2011-12-19

2

Entering edit mode

13.6 years ago

Malachi Griffith 20k

How about using LigerCat?

LigerCat: using "MeSH Clouds" from journal, article, or gene citations to facilitate the identification of relevant biomedical literature.

This publication also sounds relevant: A document clustering and ranking system for exploring MEDLINE citations

The LigerCat tag cloud for 'circadian rhythm' is provided below as an example:

alt text

ADD COMMENT • link 13.6 years ago by Malachi Griffith 20k

0

Entering edit mode

This is so far the coolest one I have seen. Let me try this out and I'll get back to you.

ADD REPLY • link 13.6 years ago by Arun 2.4k