Forum:Is The Biostar Fading? (Updated For 2014)
11
31
Entering edit mode
13.8 years ago
Joachim ★ 2.9k

Lately I was wondering whether BioStar is losing its appeal or if it is rather a subjective impression I am having about BioStar. My question: is the BioStar fading?

My own objective stab at this can be found here: http://joachimbaran.wordpress.com/2011/03/07/biostar-fading/

UPDATE (March 2012): I have redone the analysis and posted the results as "BioStar: Is the BioStar fading? An Annual Follow-Up." http://joachimbaran.wordpress.com/2012/03/11/biostar-second-analysis/

UPDATE (March 2013): This year I have added more charts to my analysis and posted the results again on my blog as "Uh-oh, Biostar: Three Years of User Metrics Analysis" http://joachimbaran.wordpress.com/2013/03/15/uh-oh-biostar/

UPDATE (March 2013, follow up to criticism): It has been pointed out that I include too many other post types besides questions/answers, which influences my analysis and statistics. Having also taken input from the Biostar API now, I re-ran the data mining script and it turns out that the web-crawler already ignores non-question posts. There are only 36 data points for "Forum" (type id 6) Biostar posts, but about 47000 question/answer related data points. In other words: the new Planet or blog posts on Biostar do not influence my analysis.

UPDATE (April 2014): The fourth annual analysis has been posted. It is the first time that I incorporated data from the Biostar RESTful API. See: "Analyzing the Biostar: Fourth Anniversary" http://joachimbaran.wordpress.com/2014/04/08/biostar_analysis_4th_year/

meta biostars • 13k views
ADD COMMENT
7
Entering edit mode

If anything, your results show to me it is going very strong! It is expected that these statistics go down. Just like any great doing stock fund will go down as more people start joining in. And, some plots do not even seem to show a slope significantly from zero; one in fact seems to go up, despite the slope. Moreover, your plot do not take into account seasonal aspects. We just had x-mas. Try comparing the last 7 months, with the same months a year ago. Also, include plots like number of active users, etc, to get a full picture.

ADD REPLY
6
Entering edit mode

it would be good to compare this data against other stackexchange websites. Maybe some of the ones listed here: http://stackexchange.com/sites

ADD REPLY
5
Entering edit mode

+1 for taking the time of collecting the data, carrying it out and presenting the results freely.

ADD REPLY
3
Entering edit mode

There is relatively strong support to convert this to a community wiki (see comments to lh3 below). Are there any objections to doing this?

ADD REPLY
1
Entering edit mode

+1 for quantifying data!

ADD REPLY
1
Entering edit mode

I have redone the analysis and posted the results in my blog again: http://joachimbaran.wordpress.com/2012/03/11/biostar-second-analysis/

ADD REPLY
1
Entering edit mode

When you first posted about this supposed fading (2 years ago, April 2011) the site had 1,800 questions and 5,200 answers, 8,000 comments. By today (April 2013) we've faded to 9,000 questions, 19,000 answers, 33,000 comments. Since last March we had 465,000 unique visitors (as measured by Google) the most ever recorded year-over-year.

I think that the success of a Q&A is measured primarily in the amount of knowledge that is distributed via the site combined with the rate of providing new advice. That could be anything: an answer, a comment, a request for clarification or even a post closing for duplication. As long as the front page is mostly green we're doing really well.

ADD REPLY
1
Entering edit mode

This is one of my favorite threads. I will take this opportunity to post yearly snapshot numbers on content:

April 8: 2014: 10,336 users have contributed 93,655 posts distributed as 14,187 questions, 27,077 answers and 48,455 comments

ADD REPLY
0
Entering edit mode

The site has 34,739 registered users that created 242,051 posts These are distributed as 44,767 questions, 60,002 answers and 133,902 comments.

ADD REPLY
0
Entering edit mode

Thanks for all the feedback! It would have been interesting to carry out more statistics on this topic, but unfortunately I did not have more time to investigate this. Thanks, again.

ADD REPLY
0
Entering edit mode

Interesting. A lot more analysis could be performed if biostar was a choice in this explorer tool: http://data.stackexchange.com/

ADD REPLY
0
Entering edit mode

+1 For the used of R and Ruby - a truly powerful combination.

ADD REPLY
0
Entering edit mode

+1 For the use of R and Ruby - a truly powerful combination

ADD REPLY
0
Entering edit mode

Hi Joachim, your analysis is flawed, and several people, including me, have told you about that. You are in fact jumping to conclusions based on a declining trend in the cumulative vote-score based on the age of the question. The effect you see and - falsely - interpret as a decline, is in fact expected and would be observable for any cumulative quantity (at constant rate). To present something informative, you would have to plot "the average number of votes/answers per month or per time window per question".

ADD REPLY
0
Entering edit mode

thank you (2014 )

ADD REPLY
0
Entering edit mode

All your posts are now private?

ADD REPLY
8
Entering edit mode
13.8 years ago
lh3 33k

It is always intriguing to interpret data.

We use the "last edit time" as the X axis. Nonetheless, I noticed that a couple of months ago, someone edited lots of old questions presumably for better clarity. Is this why we get much more data points for the 17th month? Perhaps using the time when a question was first raised is better. On the other hand, I agree that even if you used the questioning time as the X axis, the plot would not be changed much.

Furthermore, there are also multiple explanations to a negative trend rather than claiming biostar is fading. For example, I can think of: a) new users tend to be less familiar with biostar and raise less relevant questions; b) new questions tend to be a duplicate of an old one; c) most of answers and votes come from the long-existing users. I do not know which is the dominant factor or even whether there is one dominant factor. Anyway, I just hope c) is not the leading cause because c) means biostar has not attracted highly experienced users recently or has been too specific to users with particular backgrounds.

BTW, someone may argue this question should be sent to the google group because it is irrelevant to computational biology or bioinformatics, but personally I like to see discussions here. Thank you for raising this question here, Joachim.

ADD COMMENT
7
Entering edit mode

IMHO meta questions that are discussed on the site should be marked as community wikis in order to have users posting them (and the answers) out of genuine interest and not reputation points.

ADD REPLY
1
Entering edit mode

The "someone" who edited a lot of questions a while back was probably me. I was making an effort to clean up/standardize the tags a bit :-)

ADD REPLY
1
Entering edit mode

I would normally suggest that "meta" discussions about the site go to the Google group. However, since this one involves statistical analyses, I'm happy to see it discussed here.

ADD REPLY
0
Entering edit mode

I agree with Michael's suggestion.

ADD REPLY
7
Entering edit mode
13.8 years ago
Neilfws 49k

I think that this kind of statistical analysis is very valuable. As you say, we all have subjective or anecdotal opinions about BioStar, but the data don't lie.

I'd suggest that more analysis is required. As you say in the blog post, there are few clear trends. I don't know if that's because of insufficient data, or because we need to explore the data more thoroughly. Perhaps other users would be interested in contributing their own analyses.

More people participating (a good thing) but fewer votes per question (a bad thing) certainly tallies with my subjective experience. It's tempting to speculate about what that means in terms of question quality and site etiquette, but in light of what I said about subjective impressions versus data, I'll restrain myself :-)

ADD COMMENT
7
Entering edit mode
13.8 years ago

One year ago, when biostar was at the beginning, I used to up-vote questions and answers more easily, in order to help new people having the minimum score necessary to add comments/downvote/etc..

Now that the user base is much bigger, I tend to up-vote less frequently, and I have also started to down-vote with higher frequency. Moreover, as the website has grown, the number of poorly posed questions has increased: a lot of new users recognized that biostar is a good place to ask a question, but they are not familiar with how to pose a question properly, so they get downvoted.

ADD COMMENT
7
Entering edit mode

That is quite a paternalistic position you take, especially towards novel (and thus unexperienced) users. Down voting starters it quite discouraging, and in my opinion unnecessary. If a question, or answer, is posed poorly it will stay 0. You don't need to explicitly push one over the edge. I believe that the down vote should be used only on people with bad intentions (vandalism or spam).

ADD REPLY
2
Entering edit mode

I agree that it is better to prompt the person posing an unclear question. However, if that is done and the person does not come back and respond to the comments in 24 hours or so, I find it fair to down-vote. Also, I think it is perfectly fair to down-vote poor answers that might otherwise waste the time of the person asking the question.

ADD REPLY
1
Entering edit mode

I think downvoting should not be considered as bad per se, as long as there is a helpful comment on why the question/answer was downvoted and how it can be improved. On the other hand, one should retract the downvote if the the user indeed improves his post.

ADD REPLY
0
Entering edit mode

I almost never down-vote; most questions can be improved with prompting and/or editing. In fact, I'm unclear as to why one would down-vote, since spam/vandalism is better just deleted.

ADD REPLY
6
Entering edit mode
13.8 years ago

An aspect I did not see mentioned is that there now already are a lot of questions and answers on the site. So new users might not answer so many questions since they can already find the questions and the answers. And people might not easily vote for questions or answers they have seen before.

ADD COMMENT
0
Entering edit mode

in that case a well educated user will upvote the questions and answers that prevented to ask a question. So the number of upvotes will grow larger than the number of new questions. If this is not happening then or your theory is not a predominant scenario or the new users are a bit lazy.

ADD REPLY
0
Entering edit mode

Actually it is happening. I do get votes for somewhat older answers (I am not active that long) quite regularly.

ADD REPLY
6
Entering edit mode
13.8 years ago

First off thanks for taking the time to analyze the data. It is always great fun to look at various aspects.

Now I think your title is a bit sensationalistic and perhaps, as you yourself allude in the last sentence, it may not even be fully supported by your own observations. Hey it almost reads like a scientific paper on some latest and greatest observation ;-) !

I think it is very important to put everything in the context of a community and recognize the overarching human dimension that shapes and forms a site like BioStar. Contributing to BioStar is not easy, answers often require a substantial commitment. Bioinformatics is also a nascent field, there are far more beginners than experts that will have to shoulder a greater and greater responsibility.

As you yourself state it, there are more people using BioStar - more questions are asked and as far as I know every single question gets at least a comment, whereas the vast majority get answered. That seems to be the most important metric, can people find some level of assistance and they get value from this site.

There are also signs that we need to provide the tools that allow the community to educate themselves either by asking better questions, finding existing answers, voting and submitting rating on answers. I am not a big fan of fast growth exactly because it often brings newer challenges.

ADD COMMENT
1
Entering edit mode

1 million visits and growing: http://i.imgur.com/ID28t.png

ADD REPLY
4
Entering edit mode
13.2 years ago

Fading? Because many basic questions have been posed and mindful members refer to the previous post? Perhaps that is one view.

But there are indeed many new registrants posing questions - some are old (i.e. repeated) questions, some are slight modifications of old themes, but some queries are certainly new topics.

I think there are a few of us who believe that BioStar is likely in a transition phase. But then again, maybe the site will drift back to a so-called "growth phase" that it enjoyed in its first months/year with the recent publication of a manuscript in PLoS Computational Biology describing BioStar. Such publicity is likely to bring many new visitors and hopefully registrants to the site. And to that point, I hope that BioStar will see more experts join. Thus, the other day, I was happy to see Michael Reich register and answer a question about GenePattern and RNA-Seq read depth. He is a leader of the GenePattern team. That type of engagement, which may well occur with the exposure garnered from the PLoS Comp Biol paper, will certainly help BioStar to mature beyond its initial growth phase.

ADD COMMENT
4
Entering edit mode
13.2 years ago
Iain ▴ 260

Very interesting post!

A paper about Biostar just came out in PLOS Computational Biology [1]

so it would be interesting to see if effects any of the statistics.

ADD COMMENT
0
Entering edit mode

I was actually thinking the same, even though I would re-run the statistics next March (1 year later).

Other metrics than the ones I used might be interesting too, but I am not sure if all of them are publicly available: views per month, active users vs registered and unregistered users.

ADD REPLY
0
Entering edit mode

it would be nice to see if there are more "useful" questions after the publication, i.e. if people made use of the guidelines would their questions get higher votes?

ADD REPLY
4
Entering edit mode
12.8 years ago
Michael 55k

Hi Joachim, I am sorry to say this, but your analysis --while the attempt is very valuable and I appreciate this-- is flawed, or better your interpretation of it. That is also the reason for the divergence of your subjective impression you state in your post and the stable declining trend that you see in your data and that you falsely interpret as an indication of decreasing quality. There are two major issues that have been forgotten (or to cite neil, data do never lie, but so do the people interpreting them). In fact there is nothing surprising about the trend.

  1. You are neglecting some of the most important statistics for evaluating the popularity: total number of posts and number of page visits. The higher the number of total posts, the fewer posts the average reader will have time to read and vote on each of them, resulting in a reduction of average score. This effect is amplified by the 'stack' design of the site, that makes questions disappear from visibility faster if more new questions come up.

  2. is even more important. The vote score is a cumulative quantity, in the sense that score adds up over time. This is due to the tendency you also describe of few controversial votes and an overall very friendly voting pattern, with many more up- than down-votes (I don't have data to back this up, but it is quite easy to see). According to this interpretation a slight declining trend is natural, simply because older posts had more time to accumulate votes. To the contrary an increasing trend would indicate that post get up-votes first and later get voted down (smth that is very unlikely due to 'social engineering' or 'flock' effects).

Instead the relatively slight slope in vote score indicates only that most posts receive the majority of votes in a very short period of time after posting.

I hope that this gives some hints on how to improve your analysis.

ADD COMMENT
0
Entering edit mode

Thank you for pointing out some weak points of my analysis. I also think that it can be improved and perhaps someone will build their own analysis on top of my evaluation scripts.

Regarding your points, I was actually surprised to see the clear decline in all trends like last year again. I also think that your first point is not necessarily true, because there are not really that many questions posted overall and since BioStar gained many more users I would expect that newer posts get read more often than in the first few months. I am afraid I do not understand your second point.

ADD REPLY
0
Entering edit mode

Hi, the second point is most important. In your analysis you look (among other things) at the average score of questions, also in your previous blog post you pointed out that you sort questions by 'age' via last edited option. What I am saying is simply, comparing e.g. question that have been asked 12 months ago and 1 week ago, the older question surely will have more votes on average because there was more time for voting on them.

ADD REPLY
0
Entering edit mode

Hello, I get your point now and I agree that older questions had more time to receive answers/votes. However, I measure time in month in my analysis and I somehow doubt that this is a time scale on which the maturity of posts matters. Perhaps I get around to investigate this a little bit closer next year. Thanks for clarifying though.

ADD REPLY
0
Entering edit mode

I don't get what that has to do with maturity of posts? I am only giving a logical explanation for the trend, and this is true on average for any question irrespective of quality. I will however make some edits to my answer and propose some improved statistics.

ADD REPLY
0
Entering edit mode

Hm, then I do not get you again. I thought that you were saying that older posts had more time to accumulate votes and answers, whereas younger posts had not enough time yet to gain them yet. I think so too, but only if you look at post over a short timeframe in the immediate past (let's say, one or two weeks).

ADD REPLY
4
Entering edit mode
11.8 years ago
Michael 55k

Fly traps

I hope I can once and for all help to clarify this.

Imagine you wish to test a bunch of fly-traps (aka. Questions, you know these sticky yellow things, catching flies (votes, answers, whatever)). Now, you set out to test them, and every month you put up a certain amount of fly traps, marked with the date of first exposure, and once a year come back and count the number of flies in them. Let's not assume anything about the quality or durability of fly-traps for the moment, but assume they have infinite life-time.

After some time you come back and decide to do a survey of your fly-traps and see that the ones you put up 12 months ago have more flies on them than the ones you put up last month. Now, we certainly can conclude that:

  • Old fly traps are much better than new fly traps because old fly traps have more flies on them.

  • The quality of fly trap production is on the decline.

ADD COMMENT
1
Entering edit mode

Which you could change, if you normalized your data over fly-trap age.

ADD REPLY
0
Entering edit mode

Yes, but this wasn't done.

ADD REPLY
1
Entering edit mode

Would we not see older data levelling out on a plateau even without normalization? Putting it differently: the older fly-traps will attract less and less flies until their fly-catching-ability is saturated.

ADD REPLY
2
Entering edit mode

We can still vote old questions and add new answers/comments. They will not saturate. Surely new votes etc grow slowly, but the trend in your plots are not strong. You have to separate the two effects to draw a convincing conclusion. That said, it is still unfortunate to see the reduced activities of quite a few expertise (e.g. Chris, David Q, Lars among others). The quality of a Q&A site is largely determined by the expertise.

ADD REPLY
0
Entering edit mode

I see your point about the saturation. What is your opinion on the (roughly) constant number of answer/comment postings, but increasing number of questions being asked (plot "User Activity")?

ADD REPLY
1
Entering edit mode

Note that I am not denying, for example, voting score is lower for new questions - this is actually in line with my gut feeling and I do think the current scoring system discourages voting. I just wanted to point out that you could not use saturation to address Michael's concern.

ADD REPLY
0
Entering edit mode

Alright, I got it now. Saturation was not a good choice of wording. I meant that the votes level on a "voting score" plateau at some point. Whilst the score is unrestricted (so, no saturation can be achieved), I think it cannot be assumed that the downslope in voting scores will always look like it does now. Otherwise, people would have to continue voting really old posts up. Anyway, thanks for clarifying.

I am actually not very familiar with the scoring system change that occurred when Biostar moved from StackExchange. I just cannot vote right now (I can click to vote, it looks like I voted, but when I come back to this post, the scores are back to before I voted). C'est la vie.

ADD REPLY
0
Entering edit mode

Thanks for the clarification, Michael. For that very reason, I have included the other charts in my blog post and spent much less emphasis on the charts that I used in the last years. Quality was not the primary focus of this year's blog post.

ADD REPLY
3
Entering edit mode
13.8 years ago
Samuel Lampa ★ 1.3k

IMO, the name "BioStar" is not optimal, as it does not immediately communicate the purpose of the site. Neither is it (yet) as well known as Stack overflow. An example of a more successful choice of name, IMO, is "Semantic Overflow", which draws on the well-knownedness of stack overflow, and makes it immediately clear that it is the "Stack overflow for Semantic tech".

... just something that struck me from the first visit. I was close to surfing away and forgetting about BioStar, until I in the last moment realized that, "Hey, this is the stack overflow for Bioinformatics, I better bookmark it".

ADD COMMENT
3
Entering edit mode

I think you are right that the name isn't perfect since you would earlier thing about Hollywood than about bioinformatics when you hear it. But would "Semantic overflow" really be better? I think in the end names don't really matter. It is the quality and our activity te spread the word about that quality that matters.

ADD REPLY
3
Entering edit mode

My reaction to the name is not so extreme, but I agree that the purpose of the site could be made more clear. I think we need an explanatory text box on the front page (or at least a subheading) which includes the word "bioinformatics".

ADD REPLY
2
Entering edit mode

It is always easy to criticize, but I do not see a suggestion for what would have been a better name. I think you overestimate how well-known Stack Overflow is. I personally had never heard of it before I ran into BioStar, so a name ending on "Overflow" would not have given me any association other than the negative connotation of the word "overflow". I do agree with we should try to somehow make it clearer that the site is about bioinformatics rather than general biology, though.

ADD REPLY
0
Entering edit mode

@Lars: Yes, well, the "Overflow" name might be more known to pure software developers (it seems almost anything software question you google for, in the most popular languages, yields some results from stackoverflow), so maybe it is not the optimal either.

The best would be something more descriptive, I guess. I also tend to not like the focus on "being a star", rather than "helping each other", in the name, of which I'd rather see focus on the latter.

ADD REPLY
0
Entering edit mode

But anyhow ... it might not be realistic to think about a name change anyhow, so then I better let this discussion go and focus on spreading the word instead. And after all it's a great thing we have the site, and thanks to all who put their efforts into it!

ADD REPLY
0
Entering edit mode

The biostar name automatically makes me think of the bio* projects ie bioperl, biopython.... (ad nausium) also, I second the not knowing what stack overflow was until after I became acquainted with biostar.

ADD REPLY
2
Entering edit mode
11.8 years ago

I have noticed that your analysis includes blog posts from the the planet feed. Those are not posted by Biostar users moreover cannot be answered and there are quite a few of them. That affects all of your conclusions. You should redo it with the API and remove the Blog type posts from the analysis.

You can get statistics on Biostar from the API like so, the activity one day ago can be seen at:

http://www.biostars.org/api/stats/1/

the number indicates the days into the past from the current date. The output is JSON data. To get information on a post use this

http://www.biostars.org/api/post/1/

Where the number identifies the post.

ADD COMMENT
0
Entering edit mode

Not quite sure how I would have included planet feed posts. As far as I can tell, planet feed posts URIs have the structure "biostars.org/linkout/" and other posts adhere to "http://www.biostars.org/p/...". Even if those posts were included, then the vote related stats would still show a decline, because the corresponding charts plot only data mined entries with a vote.

All my conclusions would not be affected by including planet feed posts either. For example, my strongest analysis result focuses on user participation. That should be completely independent from the planet feed -- unless the planet feed introduces pseudo-users on-the-fly!?

If anything, then I would say that the possible inclusion of planet feed posts gave a false indication on the forum's growth in terms of questions asked (there should be less questions asked then). The number of comments and answers on the site would still show stagnation.

Anyway, thanks for pointing out the REST interface. It will be more convenient and less cumbersome to extract data from JSON next year...

ADD REPLY
0
Entering edit mode

the display link for a planet is indeed this

http://www.biostars.org/linkout/66591/

but the post can still be accessed directly via

http://www.biostars.org/p/66591/

Under normal usage a user cannot access the blog post directly but in your case since you are generating the post ids numerically you will hit these posts as well.

How the analysis changes remains to be seen, it would be very strange though if your measures for user participation and site utility would not improve after removing a thousand posts with no answers. I would say that a measure that behaves like that is pretty useless.

As I said before, the API provides a clean data access and you should use that to produce what you need.

ADD REPLY
0
Entering edit mode

Thanks for pointing out the connection between the URIs.

Removing posts will not improve the statistics though. Please have a look at the R script that I made publicly available and check again with my blog post, because user participation is calculated independent of question/answer combinations. Site utility (is that activity?) will also drop, simply because data points would be removed from the data set -- most easily explained by the stacked bar chart.

I think that my blog post and its charts are a valuable tool to improve Biostar since the exposed deficits are all actionable. The metrics I have chosen go beyond what other platforms such as Google Analytics offer, which makes them even more useful in my opinion.

ADD REPLY

Login before adding your answer.

Traffic: 2496 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6