Where Can I Get Stats For Previous Uniprot Release?
3
4
Entering edit mode
13.2 years ago

UniProt pages show the most recent release accounting:

Right now it's version 2011_09

Where can I find stats for 2011_08?

In particular I need the number of species.

uniprot • 5.6k views
ADD COMMENT
6
Entering edit mode
13.2 years ago

When I wrote this: http://blog.fuzzierlogic.com/archives/425, the historical stats of SwissProt/TrEMBL weren't very publicly accessible. This is a situation that has, if anything, got worse. In my post I link to a page that at least had the sizes of SwissProt for old releases, but even that has now disappeared with the redesign of the ExPASy homepages.

In the tarball that you download for old releases (eg, release 2011_08.tgz, 6.9GB), there is apparently a docs/ dir, it may be that the info you need is in there somewhere. It does seem crazy to download nearly 7GB of data to be able to find this simple figure though.

EDIT - Yes, the docs/ dir contains the speclist.txt (same in current release), which I think has the information you're after.

ADD COMMENT
0
Entering edit mode

I already extracted the 7GB, but wasn't sure how to best get the count out of speclist.txt - grep -c ": N=" docs/speclist.txt gives 20,623 for 2011_08 and for the current one. I tough there were close to 400k species?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I already extracted the 7GB, but wasn't sure how to best get the count out of speclist.txt. Command grep -c ": N=" gives 20,623 for 2011_08 and for the current one. Wasn't Trembl close to 400k species?

ADD REPLY
0
Entering edit mode

Speclist does not really have the information that you are looking for. Its a list of tax codes to human short hand. However, many (most) of these tax codes are not at the species rank. Giving an incorrect answer by orders of magnitude.

ADD REPLY
4
Entering edit mode
13.2 years ago

After reading @Simon's answer, I made this CRON script that will save Uniprot stats history and push it to GitHub Pages: http://alevchuk.github.com/uniprot-stats-history

I'm now running this scirpt on one of our servers - it will mirror Uniprot pages every Wednesday. If there are any changes then they will get pushed out to GitHub automatically.

If you are interested in how it works the code is on my Github profile.

ADD COMMENT
0
Entering edit mode

So far my archiving script was doing it's job without any human interference for 3 month: http://alevchuk.github.com/uniprot-stats-history/

ADD REPLY
0
Entering edit mode

5 month of history - still going and untouched

ADD REPLY
0
Entering edit mode

7 month of history, Yey!

ADD REPLY
0
Entering edit mode

over 1 year of history now

ADD REPLY
3
Entering edit mode
13.1 years ago
Jerven ▴ 660

I read this question and soon you will get better release notes for uniprot.org as well as historical ones. You can find recent release notes on the this beta page if you have any comments or suggestions please write to help@uniprot.org.

The number of species is not really in the current release notes. The closest thing is number of distinct taxonomy identifiers.

For 2011_08 there where http://www.uniprot.org/taxonomy/?query=rank%3Aspecies+uniprot%3A(not created%3A[20110921+TO+current]) 304,314. Which increased by 6,294 species in release 2011_09. Where species are real species according to the NCBI/UniProt taxonomy. If you are running this query in the future these numbers might change a little bit which is why this will be recorded and saved for the future so that one.

ADD COMMENT
1
Entering edit mode

+1 for Uniprot responding to community needs raised on BioStar.

ADD REPLY

Login before adding your answer.

Traffic: 2515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6