Question

Locations Of Plots Of Quantities Of Publicly Available Biological Data

6

Entering edit mode

14.5 years ago

Gotgenes ▴ 460

There's a cliché in talks and presentations these days demonstrating the rapid (typically exponential, or super-exponential) growth of publicly available biological data of one nature or another (e.g., sequence data, yeast2hybrid, etc.). They're frequently juxtaposed against a plot of Moore's law. You know the type. You probably have even used or made such a plot if you're at this site.

It's not always obvious where to find these plots. Surprisingly (disappointingly, even), major clearing houses for biological data such as GenBank and Gene Expression Omnibus (GEO) don't provide plots of their growth in any obvious location, let alone their front pages (where it makes the most sense to display such positive trends). Let's compile a list of where to find these plots, including, but not limited to:

Publications (decent)
Open-access publications (good)
Sites that provide up-to-date plots (better)
Scripts or programs that generate plots on the fly (excellent)

visualization • 6.2k views

ADD COMMENT • link updated 13.7 years ago by Casey Bergman 18k • written 14.5 years ago by Gotgenes ▴ 460

3

Entering edit mode

I think it would also be interesting to post code that can generate these plots. The data are often available, although often not in the best format, for those who'd like to try a roll-your-own approach.

ADD REPLY • link 14.5 years ago by Neilfws 49k

2

Entering edit mode

Good to see you here!

ADD REPLY • link 14.5 years ago by Paulo Nuin ★ 3.7k

Ram · Answer 1 · 2010-10-21

6

Entering edit mode

14.5 years ago

Mary 11k

We started this the other day. See this thread: Exponentially Increasing Genomes Slide Another one I like that hasn't come up yet is the growth of GeneTests, disease for which testing is available: http://www.ncbi.nlm.nih.gov/projects/GeneTests/static/whatsnew/labdirgrowth.shtml

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 14.5 years ago by Mary 11k

1

Entering edit mode

was about to write the same thing, you were 3 secs faster ;)

ADD REPLY • link 14.5 years ago by Michael Schubert ★ 7.1k

1

Entering edit mode

Thanks. I failed in picking my search terms to look for an existing question. I don't know if we should close this question as a duplicate, as I'm interested in any type of (high-throughput) biological data.

ADD REPLY • link 14.5 years ago by Gotgenes ▴ 460

0

Entering edit mode

then you may want to refine your question in order to not be a duplicate ;)

ADD REPLY • link 14.5 years ago by Michael Schubert ★ 7.1k

score 5 · Answer 2 · 2010-10-21

Data for the growth of the number of articles in MEDLINE can be found here:

http://www.nlm.nih.gov/bsd/licensee/baselinestats.html

There is some time lag in interpreting numbers from the MEDLINE baseline files. For example, good data on the growth of MEDLINE through 2008 can be found in the 2010 baseline statistics: http://www.nlm.nih.gov/bsd/licensee/2010_stats/2010_Totals.html

EDIT 1: Data for the growth of the number of GeneRIFs in Entrez Gene can be found here:

http://www.ncbi.nlm.nih.gov/projects/GeneRIF/stats/

EDIT 2: Data for the growth of the number of GWAS studies in the Human Genome Epidemiology database:

http://hugenavigator.net/HuGENavigator/startPageWatch.do

Ram · Answer 3 · 2010-11-02

Already added sequence data growth in Uniprot in the other question, As you are interested in various data categories - here is the exponential growth of RCSB-PDB from 70's - till date. Kudos to RCSB-PDB team for providing the data and the graph in a convenient way.

EDIT by RamRS: Khader's link to his own answer is dead and does not point to a post on biostars.org because the post seems to have been lost before migration. Here is a link to an archived version of the post: https://web.archive.org/web/20111124051054/http://biostar.stackexchange.com/questions/2966/exponentially-increasing-genomes-slide/2973

Here is a picture of his answer:

alt text

score 4 · Answer 4 · 2010-10-21

4

Entering edit mode

14.5 years ago

Michael Schubert ★ 7.1k

You might also want to take a look at this:

Björk B-C, Welling P, Laakso M, Majlender P, Hedlund T, et al. (2010) Open Access to the Scientific Journal Literature: Situation 2009. PLoS ONE 5(6): e11273

edit: there are some issues with the paper, see Lars' blogpost.

ADD COMMENT • link 14.5 years ago by Michael Schubert ★ 7.1k

Ram · Answer 5 · 2010-11-05

Just a brief note on a way to generate "growth of database" data yourself, at least for the Entrez databases.

Most of the Bio* projects include an EUtils library. The BioRuby module has a useful method, esearch_count, which counts the number of results for a query. As an example, you could retrieve total publications in PubMed for years 2000-2010 like this:

#!/usr/bin/ruby
require "rubygems"
require "bio"

Bio::NCBI.default_email = "me@me.com"
ncbi = Bio::NCBI::REST.new

2000.upto(2010) do |year|
  all   = ncbi.esearch_count("#{year}[dp]", {"db" => "pubmed"})
  puts "#{year}\t#{all}"
end

Redirect the output to create a tab-delimited file with year + count. Here, we're searching the DP (date published) field in PubMed. You could substitute any Entrez database, search term(s) and years.

Ram · Answer 6 · 2010-10-30

3

Entering edit mode

14.5 years ago

Bio_X2Y ★ 4.4k

The Silva website plots the growth of ribosomal RNA databases. e.g. http://www.arb-silva.de/documentation/background/release-104/

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 14.5 years ago by Bio_X2Y ★ 4.4k

score 3 · Answer 7 · 2010-11-03

SCOP has listed out the statistics of it's release history in tabular form from last 12 years.

Scop Classification Statistics

I agree with Khader that PDB has done excellent job to report the statistics on it's entries. They have something called histogram menu which can easily generate statistics on current entries based on various criterion.

ex: Source Organism (Gene Source) Histogram

score 3 · Answer 8 · 2010-11-04

3

Entering edit mode

14.5 years ago

Gotgenes ▴ 460

There is a news article from October 2010 in Science that has a plot of the growth of human SNP data, particularly with regards to the 1000 Genomes project.

ADD COMMENT • link 14.5 years ago by Gotgenes ▴ 460

0

Entering edit mode

Bump! Not an OA article.

ADD REPLY • link 14.5 years ago by Khader Shameer 18k

score 3 · Answer 9 · 2010-11-05

3

Entering edit mode

14.5 years ago

Rob ▴ 30

A recent paper with an updated "Growth of GEO" plot:

Le et al. Cross-species queries of large gene expression databases. Bioinformatics (2010) vol. 26 (19) pp. 2416-23

ADD COMMENT • link 14.5 years ago by Rob ▴ 30