Where Is Pam 1.0.7 Source (To Generate Pam Substitution Matrices)?
2
0
Entering edit mode
12.3 years ago

Every PAM substitution matrix on the web begins with a line like

# This matrix was produced by "pam" Version 1.0.6 [28-Jul-93]

or version 1.0.7. Unfortunately, the pam source code tarball at ftp://ftp.ncbi.nih.gov/blast/matrices/ seems to predate even 1.0.6 (among other things, it doesn't emit the comments at the beginning, and it doesn't include even options to generate values for B, X, Z, and *.) Try as I might, I can't seem to find the actual source code for pam 1.0.6 or 1.0.7.

Why do I want it? Two reasons. First, I'm teaching a bioinformatics class and introducing substitution matrices. The PAM matrices are great for teaching for many reasons, but I like them because of the evolutionary model on which they are based. I'd like to be able to illustrate their construction and create arbitrary PAM matrices to support a few exercises. Second, I'm mystified that the source would be so hard to find. Bioinformatics researchers are among the most active in promoting reproducible research; we all depend on these matrices, derived from a simple and elegant framework. And yet I'm stumped.

Hopefully, I'll just be making a fool of myself in this question and find that the source is obvious and everywhere.

alignment • 3.6k views
ADD COMMENT
0
Entering edit mode
  • Second, I'm mystified that the source would be so hard to find - I may have become a bit cynical but considering that we are talking about a program that was run in 1993 I am amazed that you have found anything at all.

  • Bioinformatics researchers are among the most active in promoting reproducible research - in the same spirit allow me correct that, Some bioinformatics researchers are very active in promoting reproducible research. See this post on News page, a major company that has sold sequencers worth hundreds of millions of dollars has simply took down the site that hosted the software used to process data for their platform.

ADD REPLY
0
Entering edit mode

I had hoped to find something, in part because the 1.0.7 version is dated 13-Aug-03.

As for bioinformatics researchers: point taken :-(

ADD REPLY
0
Entering edit mode

Actually, I believe you should write the NCBI Help Desk, they will most probably answer positively to your request for the updated code.

ADD REPLY
1
Entering edit mode

I second that - I'd be curios to see whether they can solve this - bioinformatics aging are like dog years, one year in bioinformatics is like seven years of real life

ADD REPLY
0
Entering edit mode

Good suggestion :-) As it turns out, I had written to NCBI at about the same time as my original post. I've been in email conversation with them since. My colleague Chris Hogue also contacted them separately. I can't quote any of their emails here, (because their work for the Federal Government requires approval for that) but there may be something approved in the near future.

In addition to our discussion about the software itself, they did add that the internal use of PAM matrices in BLAST is hard-coded in the NCBI toolkit and that matches in the initial protein search against ambiguious characters is treated as a constant, though the matrix composition step treats them differently. Given the scarcity of B, J, Z, and X characters in protein sequences, there should be hardly any impact.

ADD REPLY
1
Entering edit mode
12.3 years ago

After some discussion with the good folks at NCBI, I've gotten this update, which is now posted on the NCBI ftp site: As indicated, the code will be updated by NCBI to match the files in the same directory.

README for blast/matrices.
Last updated 09/10/12.

This directory contains matrices produced by various programs.
These matrices should be useable with BLAST and other programs.

BLAST binaries produced within the last few years store matrix information for BLOSUM45, BLOSUM50, BLOSUM62, BLOSUM80, BLOSUM90, PAM30, PAM70, and PAM250 as part of the program and do not need external files.

The directory also contains a tar ball (pam.tar.gz) with source code to produce the PAM matrices. The output from this program differs from the PAM matrices in this directory. It does not produce comments and does not produce scores for B, Z, X or *. In the near future, the NCBI plans to update this code so as to produce comments similar to those of the PAM matrices in this directory and also to include values for B, Z, X and *.

ADD COMMENT
0
Entering edit mode
12.3 years ago

I would use the code you have, which has all the basics to learn a lot of things about substitution matrices. And maybe use its lack of features to teach your students how to get to the values of the ambiguous characters B, X, Z and J with the rules:

  • B is asparagine (N) or aspartic acid (D)
  • Z is glutamic acid (E) or glutamine (Q)
  • J is leucine (L) or isoleucine (I)
  • X is any amino acid (unidentified)
ADD COMMENT
0
Entering edit mode

I may in future terms or in related courses; there's no opportunity to squeeze in the changes required for this term.

ADD REPLY

Login before adding your answer.

Traffic: 1420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6