*EDIT: for anyone interested, I expanded on this answer in a blog post*
I don't know of a server or database for which this information has been pre-computed or can be retrieved by a search.
However, I think it is not too much work to craft a solution using a few tools. I'd do something like this:
(1) First, retrieve sequences of PDB chains in FASTA format:
wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz
gunzip pdbaa.gz
(2) Next, cluster the sequences using CD-HIT, choosing a high value for -c so as to obtain clusters of highly similar sequences:
cd-hit -i pdbaa -o pdb99 -c 0.99 -n 5
Example cluster:
>Cluster 13
0 1676aa, >gi|319443753|pdb|3PRX|A... *
1 1676aa, >gi|190016356|pdb|3CU7|A... at 99.94%
(3) Parse that file to extract the GIs or PDB IDs from each cluster and create a new FASTA file with sequences for each cluster.
(4) Then, do an all-versus-all global alignment for each new FASTA file, using something like needleall from the EMBOSS suite.
needleall -aformat3 pair -stdout -auto -asequence cluster1.fa \
-bsequence cluster1.fa > cluster1.needleall
(This is a bit dumb, since for sequences A, B needleall will generate 4 alignments: AA twice and AB twice - but you get the general idea)
A portion of the alignment file (for the first case of chain aligned to self):
# Aligned_sequences: 2
# 1: 1VS5R
# 2: 1VS5R
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 75
# Identity: 75/75 (100.0%)
# Similarity: 75/75 (100.0%)
# Gaps: 0/75 ( 0.0%)
# Score: 385.0
(5) Parse the alignment output to extract the PDB ID of sequence 1, sequence 2 where, for example:
Length = 100
Identity = 99/100
# therefore differ by 1 amino acid
Awesome answer and blog post. I'm curious, how many mutant pairs does this find?
Added that info at end of blog post. There are 12 912 pairs of PDB chains that differ by 1 residue. Of those, 1 914 pairs differ due to one gap in the alignment; the other 10 998 are due to 1 amino acid change.
Thanks neilfws for that nice post. Also for your blog post. Looks like a good start to approach the problem. I should have mentioned in my initial post that I'm specifically interested in structural perspective of the problem, i.e. what are the deviations between two structures due to a single aa exchange. Since seqres might not be equal to the sequence in the atom records, there might be more work involved.
True. In which case I guess you'd want to extract sequence from the PDB record (or something derived from it). So, parsing the whole PDB it is!