Hi everyone,
Whenever I teach an introductory class about Bioinformatics, I like to use this word cloud - I feel it gives a quick glimpse of the field. However, it is outdated (from 2011).
So I want to create an up-to-date figure. I understand I can manually copy the tag counts from the tag content and use only the first few pages, because the word counts quickly drop below 15 or so.
However, ideally I would like to have the data for each year separately, to show how topics change over time. Unfortunately, I am a new user (first post here) and do not have the privileges to download the database as shown in the blog post above.
Would anyone with access be able / willing to fetch and share these data?
Thank you very much!
Nice application! Tags are very often used incorrectly though. An alternative approach could be to use the title/abstract of recent (bioinformatics) papers? Although filtering those terms obviously requires some more work to get the bioinformatics terminology out.
Well, using paper title/abstracts would certainly be useful, although in a different way.
For my teaching purposes, I actually want to use community-based information, in the sense that it reflects user needs. I expect my students to face many of the questions that other users experience. For instance, note that the tag software error has been used 1838 times - I would then address the value of resources such as Biostars. It also shows trends in language use (e.g., noticeable drop in perl, increasing importance of python and especially R).
Some noise due to incorrect tag usage should not be a big deal. In any case, I will filter for only the most used ones, so the signal will still be there.
Thanks for the input!
Gha, funny that you pick the example of
software error
. Because it is used so frequently it will become the first suggestion users get when making a new post, leading it to be used more and more. Often it's actually a user error :)Even better!
Nice to know this, because in classes students show similar behavior. They often jump the gun and call anything a software error, when it is usually just a typo. This will make a good example!
Here is a list of tags with counts:
http://data.biostarhandbook.com/data/biostar-tags.txt
cc Istvan Albert , Devon Ryan and Pierre Lindenbaum