Every PAM substitution matrix on the web begins with a line like
# This matrix was produced by "pam" Version 1.0.6 [28-Jul-93]
or version 1.0.7. Unfortunately, the pam source code tarball at ftp://ftp.ncbi.nih.gov/blast/matrices/ seems to predate even 1.0.6 (among other things, it doesn't emit the comments at the beginning, and it doesn't include even options to generate values for B, X, Z, and *.) Try as I might, I can't seem to find the actual source code for pam 1.0.6 or 1.0.7.
Why do I want it? Two reasons. First, I'm teaching a bioinformatics class and introducing substitution matrices. The PAM matrices are great for teaching for many reasons, but I like them because of the evolutionary model on which they are based. I'd like to be able to illustrate their construction and create arbitrary PAM matrices to support a few exercises. Second, I'm mystified that the source would be so hard to find. Bioinformatics researchers are among the most active in promoting reproducible research; we all depend on these matrices, derived from a simple and elegant framework. And yet I'm stumped.
Hopefully, I'll just be making a fool of myself in this question and find that the source is obvious and everywhere.
Second, I'm mystified that the source would be so hard to find - I may have become a bit cynical but considering that we are talking about a program that was run in 1993 I am amazed that you have found anything at all.
Bioinformatics researchers are among the most active in promoting reproducible research - in the same spirit allow me correct that, Some bioinformatics researchers are very active in promoting reproducible research. See this post on News page, a major company that has sold sequencers worth hundreds of millions of dollars has simply took down the site that hosted the software used to process data for their platform.
I had hoped to find something, in part because the 1.0.7 version is dated 13-Aug-03.
As for bioinformatics researchers: point taken :-(
Actually, I believe you should write the NCBI Help Desk, they will most probably answer positively to your request for the updated code.
I second that - I'd be curios to see whether they can solve this - bioinformatics aging are like dog years, one year in bioinformatics is like seven years of real life
Good suggestion :-) As it turns out, I had written to NCBI at about the same time as my original post. I've been in email conversation with them since. My colleague Chris Hogue also contacted them separately. I can't quote any of their emails here, (because their work for the Federal Government requires approval for that) but there may be something approved in the near future.
In addition to our discussion about the software itself, they did add that the internal use of PAM matrices in BLAST is hard-coded in the NCBI toolkit and that matches in the initial protein search against ambiguious characters is treated as a constant, though the matrix composition step treats them differently. Given the scarcity of B, J, Z, and X characters in protein sequences, there should be hardly any impact.