Bioinformatics word cloud to use in classes
2
3
Entering edit mode
5.8 years ago
gramarga ▴ 50

Hi everyone,

Whenever I teach an introductory class about Bioinformatics, I like to use this word cloud - I feel it gives a quick glimpse of the field. However, it is outdated (from 2011).

So I want to create an up-to-date figure. I understand I can manually copy the tag counts from the tag content and use only the first few pages, because the word counts quickly drop below 15 or so.

However, ideally I would like to have the data for each year separately, to show how topics change over time. Unfortunately, I am a new user (first post here) and do not have the privileges to download the database as shown in the blog post above.

Would anyone with access be able / willing to fetch and share these data?

Thank you very much!

teaching education • 4.8k views
ADD COMMENT
2
Entering edit mode

Nice application! Tags are very often used incorrectly though. An alternative approach could be to use the title/abstract of recent (bioinformatics) papers? Although filtering those terms obviously requires some more work to get the bioinformatics terminology out.

ADD REPLY
0
Entering edit mode

Well, using paper title/abstracts would certainly be useful, although in a different way.

For my teaching purposes, I actually want to use community-based information, in the sense that it reflects user needs. I expect my students to face many of the questions that other users experience. For instance, note that the tag software error has been used 1838 times - I would then address the value of resources such as Biostars. It also shows trends in language use (e.g., noticeable drop in perl, increasing importance of python and especially R).

Some noise due to incorrect tag usage should not be a big deal. In any case, I will filter for only the most used ones, so the signal will still be there.

Thanks for the input!

ADD REPLY
0
Entering edit mode

Gha, funny that you pick the example of software error. Because it is used so frequently it will become the first suggestion users get when making a new post, leading it to be used more and more. Often it's actually a user error :)

ADD REPLY
0
Entering edit mode

Even better!

Nice to know this, because in classes students show similar behavior. They often jump the gun and call anything a software error, when it is usually just a typo. This will make a good example!

ADD REPLY
1
Entering edit mode

Here is a list of tags with counts:

http://data.biostarhandbook.com/data/biostar-tags.txt

ADD REPLY
0
Entering edit mode
ADD REPLY
6
Entering edit mode
5.8 years ago

instead of using biostar, use pubmed an the mesh terms. Here is an example using my tools http://lindenb.github.io/jvarkit/PubmedDump.html and http://lindenb.github.io/jvarkit/XsltStream.html

ADD COMMENT
0
Entering edit mode

jiz this is very anthropocentric .. ;)

ADD REPLY
0
Entering edit mode

Pierre, thanks for pointing this out!

Please, see my reply above about wanting to reflect community questions.

ADD REPLY
0
Entering edit mode

The blog post What is bioinformatics about? also has a recipe for creating word clouds from abstracts. At this moment, the blog is down to me, though.

ADD REPLY
0
Entering edit mode

What tool creates the cloud itself? It has cool looking styling, what parameters does it need to make it look like that? Now that I have played a bit with word clouds I think that figuring out the right styling is a separate challenge onto its own.

ADD REPLY
0
Entering edit mode

after googling: https://wordart.com/

ADD REPLY
0
Entering edit mode

Love the standalone "High".

ADD REPLY
6
Entering edit mode
5.8 years ago

Here is an image I made from the data at

http://data.biostarhandbook.com/data/biostar-tags.txt

enter image description here

ADD COMMENT
2
Entering edit mode

I used these data and made a figure to look like the one from the original post. RNA-seq pretty much overwhelms everything else. Word cloud in the original blog style

Same data with different scaling. Different scale and style

ADD REPLY
0
Entering edit mode

Perhaps consider working your process up in a github repo or blog post? Others might like to reuse this approach in future :)

ADD REPLY
0
Entering edit mode

Sure! Would you mind pointing me to a GitHub example with good practices?

ADD REPLY
0
Entering edit mode

looks like R is missing, second most common tag, probably because it is one letter long.

ADD REPLY
0
Entering edit mode

Another word cloud, this time based on the words in the 1000 most highly voted post titles:

and using the https://wordart.com service.

enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6