UniProt pages show the most recent release accounting:
Right now it's version 2011_09
Where can I find stats for 2011_08?
In particular I need the number of species.
UniProt pages show the most recent release accounting:
Right now it's version 2011_09
Where can I find stats for 2011_08?
In particular I need the number of species.
When I wrote this: http://blog.fuzzierlogic.com/archives/425, the historical stats of SwissProt/TrEMBL weren't very publicly accessible. This is a situation that has, if anything, got worse. In my post I link to a page that at least had the sizes of SwissProt for old releases, but even that has now disappeared with the redesign of the ExPASy homepages.
In the tarball that you download for old releases (eg, release 2011_08.tgz, 6.9GB), there is apparently a docs/
dir, it may be that the info you need is in there somewhere. It does seem crazy to download nearly 7GB of data to be able to find this simple figure though.
EDIT - Yes, the docs/
dir contains the speclist.txt
(same in current release), which I think has the information you're after.
After reading @Simon's answer, I made this CRON script that will save Uniprot stats history and push it to GitHub Pages: http://alevchuk.github.com/uniprot-stats-history
I'm now running this scirpt on one of our servers - it will mirror Uniprot pages every Wednesday. If there are any changes then they will get pushed out to GitHub automatically.
If you are interested in how it works the code is on my Github profile.
So far my archiving script was doing it's job without any human interference for 3 month: http://alevchuk.github.com/uniprot-stats-history/
I read this question and soon you will get better release notes for uniprot.org as well as historical ones. You can find recent release notes on the this beta page if you have any comments or suggestions please write to help@uniprot.org.
The number of species is not really in the current release notes. The closest thing is number of distinct taxonomy identifiers.
For 2011_08 there where http://www.uniprot.org/taxonomy/?query=rank%3Aspecies+uniprot%3A(not created%3A[20110921+TO+current]) 304,314. Which increased by 6,294 species in release 2011_09. Where species are real species according to the NCBI/UniProt taxonomy. If you are running this query in the future these numbers might change a little bit which is why this will be recorded and saved for the future so that one.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I already extracted the 7GB, but wasn't sure how to best get the count out of speclist.txt -
grep -c ": N=" docs/speclist.txt
gives 20,623 for 2011_08 and for the current one. I tough there were close to 400k species?http://biocluster.ucr.edu/~alevchuk/projects/036-thesis/knowledgebase2011_08/docs/speclist.txt
I already extracted the 7GB, but wasn't sure how to best get the count out of speclist.txt. Command
grep -c ": N="
gives 20,623 for 2011_08 and for the current one. Wasn't Trembl close to 400k species?Speclist does not really have the information that you are looking for. Its a list of tax codes to human short hand. However, many (most) of these tax codes are not at the species rank. Giving an incorrect answer by orders of magnitude.