Question

Word clouds from PubMed titles

0

Entering edit mode

10.2 years ago

cdsouthan ★ 1.9k

From a large list of PMIDs (20K < 100K) I'd like to make a word-cloud of at least the titles (I guess the abstract text is too much?) This is kind of obvious so I'm sure its been done (http://www.jcheminf.com/content/6/1/40 comes close)

Would also be possible to make separate Word Cloud from just the MeSH terms under those PMIDs? (OK, OK - thats two questions)

Needs to be simplish (web tool?) or even a KNIME node

text • 3.7k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by cdsouthan ★ 1.9k

1

Entering edit mode

Similar question was asked some time ago :-)

Web Tool That Converts A Pubmed Query Into A Wordle Of The Abstracts

ADD REPLY • link 10.2 years ago by PoGibas 5.1k

0

Entering edit mode

OK - missed that - but 4 years ago was LBK (Life Before KNIME)

ADD REPLY • link 10.2 years ago by cdsouthan ★ 1.9k

0

Entering edit mode

If anyone from NCBI is listening could this be a cool (NOBA useful!) option for PubMed query outputs in general?

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by cdsouthan ★ 1.9k

Ram · Answer 1 · 2014-09-17

Not really a simplish way to do it...

Say you have your titles, one per line, in titles.txt. You could do:

Remove non alphanumeric characters and put one word per line:

sed -r 's/[^A-Za-z0-9 ]//g' titles.txt  | sed s'/ /\n/g' > words.txt

Read in R and plot:

R
library(wordcloud)
words<- read.table('words.txt', sep= '\t')
words$words<- toupper(words$V1)

dat<- data.frame(
    word= names(table(words$words)),
    word_count= as.vector(table(words$words)),
    word_len= nchar(names(table(words$words))))

## Remove short words (probably also common English words should go)
dat<- dat[dat$word_len > 2,]

## Plot
pal<- brewer.pal(8,"Dark2")
wordcloud(words= dat$word, freq= dat$word_count, rot.per=0, colors=pal)