From a large list of PMIDs (20K < 100K) I'd like to make a word-cloud of at least the titles (I guess the abstract text is too much?) This is kind of obvious so I'm sure its been done (http://www.jcheminf.com/content/6/1/40 comes close)
Would also be possible to make separate Word Cloud from just the MeSH terms under those PMIDs? (OK, OK - thats two questions)
Needs to be simplish (web tool?) or even a KNIME node
Say you have your titles, one per line, in titles.txt. You could do:
Remove non alphanumeric characters and put one word per line:
sed -r 's/[^A-Za-z0-9 ]//g' titles.txt |sed s'/ /\n/g'> words.txt
Read in R and plot:
R
library(wordcloud)
words<- read.table('words.txt', sep='\t')
words$words<- toupper(words$V1)
dat<- data.frame(
word= names(table(words$words)),
word_count= as.vector(table(words$words)),
word_len= nchar(names(table(words$words))))## Remove short words (probably also common English words should go)
dat<- dat[dat$word_len> 2,]## Plot
pal<- brewer.pal(8,"Dark2")
wordcloud(words= dat$word, freq= dat$word_count, rot.per=0, colors=pal)
Similar question was asked some time ago :-)
Web Tool That Converts A Pubmed Query Into A Wordle Of The Abstracts
OK - missed that - but 4 years ago was LBK (Life Before KNIME)
If anyone from NCBI is listening could this be a cool (NOBA useful!) option for PubMed query outputs in general?