Word clouds from PubMed titles
1
0
Entering edit mode
10.2 years ago
cdsouthan ★ 1.9k

From a large list of PMIDs (20K < 100K) I'd like to make a word-cloud of at least the titles (I guess the abstract text is too much?) This is kind of obvious so I'm sure its been done (http://www.jcheminf.com/content/6/1/40 comes close)

Would also be possible to make separate Word Cloud from just the MeSH terms under those PMIDs? (OK, OK - thats two questions)

Needs to be simplish (web tool?) or even a KNIME node

text • 3.7k views
ADD COMMENT
1
Entering edit mode

Similar question was asked some time ago :-)

Web Tool That Converts A Pubmed Query Into A Wordle Of The Abstracts

ADD REPLY
0
Entering edit mode

OK - missed that - but 4 years ago was LBK (Life Before KNIME)

ADD REPLY
0
Entering edit mode

If anyone from NCBI is listening could this be a cool (NOBA useful!) option for PubMed query outputs in general?

ADD REPLY
2
Entering edit mode
10.2 years ago

Not really a simplish way to do it...

Say you have your titles, one per line, in titles.txt. You could do:

Remove non alphanumeric characters and put one word per line:

sed -r 's/[^A-Za-z0-9 ]//g' titles.txt  | sed s'/ /\n/g' > words.txt

Read in R and plot:

R
library(wordcloud)
words<- read.table('words.txt', sep= '\t')
words$words<- toupper(words$V1)

dat<- data.frame(
    word= names(table(words$words)),
    word_count= as.vector(table(words$words)),
    word_len= nchar(names(table(words$words))))

## Remove short words (probably also common English words should go)
dat<- dat[dat$word_len > 2,]

## Plot
pal<- brewer.pal(8,"Dark2")
wordcloud(words= dat$word, freq= dat$word_count, rot.per=0, colors=pal)
ADD COMMENT

Login before adding your answer.

Traffic: 2827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6