For reference, please read this excerpt from Human non-synonymous SNPs: server and survey Vasily Ramensky, Peer Bork, and Shamil Sunyaev
Profile analysis of homologous sequences. The amino acid replacement may be incompatible with the spectrum of substitutions observed at that position in a family of homologous proteins. PolyPhen identifies homologues of the input sequences via a BLAST (23) search of the NRDB database. The set of aligned sequences with sequence identity to the input sequence in the range 30±94% (inclusive) is used by the new version of the PSIC (position-specific independent counts) software (24) to calculate the so-called profile matrix (http://strand.imb.ac.ru/PSIC/). Elements of the matrix (pro- file scores) are logarithmic ratios of the likelihood of a given amino acid occurring at a particular site to the likelihood of this amino acid occurring at any site (background frequency). PolyPhen computes the absolute value of the difference between profile scores of both allelic variants in the polymorphic position. PolyPhen also shows the number of aligned sequences at the query position; this may be used to assess the reliability of profile score calculations.
I'd like to calculate something similar (score variants based on frequency that AA in aligned sequences) to what's mentioned here programmatically, but I can't find any implementation of the above described system.
Does anyone know of a working implementation of this or something similar, that's available either in code or as a web service?
Or should it is easy enough to implement something like this ourselves?
The form on the above page triggers http://strand.imb.ac.ru/PSIC-cgi/run.pl so that Perl script probably has the code you're looking for. Maybe mail the webmaster (vlasov@imb.imb.ac.ru) or the authors of the article for a copy of that code?
I want to do this programmatically, so I can do this scoring thousands of times.. Manual wont do, and using curl for this seems hackish, unreliable & sensitive to change.