Question

Easy Retrieval Of Mutant Pdb Structures

2

Entering edit mode

13.5 years ago

Chris ★ 1.6k

I'm looking for an automatic way to find all mutant structures to a given one. More specifically, the task would be to return all pdb ids to a given one that differ only by one amino acid. I already know about the MutaProt server, which seems to be outdated however (last update in 2006). Are there other more recent servers? I'm sure I could get the task done by parsing the whole pdb, but I'd rather like to avoid this. Thanks.

pdb mutation protein protein structure • 5.4k views

ADD COMMENT • link updated 13.4 years ago by Neilfws 49k • written 13.5 years ago by Chris ★ 1.6k

score 4 · Answer 1 · 2012-02-01

4

Entering edit mode

13.5 years ago

Neilfws 49k

*EDIT: for anyone interested, I expanded on this answer in a blog post*

I don't know of a server or database for which this information has been pre-computed or can be retrieved by a search.

However, I think it is not too much work to craft a solution using a few tools. I'd do something like this:

(1) First, retrieve sequences of PDB chains in FASTA format:

wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz
gunzip pdbaa.gz

(2) Next, cluster the sequences using CD-HIT, choosing a high value for -c so as to obtain clusters of highly similar sequences:

cd-hit -i pdbaa -o pdb99 -c 0.99 -n 5

Example cluster:

>Cluster 13
0       1676aa, >gi|319443753|pdb|3PRX|A... *
1       1676aa, >gi|190016356|pdb|3CU7|A... at 99.94%

(3) Parse that file to extract the GIs or PDB IDs from each cluster and create a new FASTA file with sequences for each cluster.

(4) Then, do an all-versus-all global alignment for each new FASTA file, using something like needleall from the EMBOSS suite.

needleall -aformat3 pair -stdout -auto -asequence cluster1.fa \
          -bsequence cluster1.fa > cluster1.needleall

(This is a bit dumb, since for sequences A, B needleall will generate 4 alignments: AA twice and AB twice - but you get the general idea)

A portion of the alignment file (for the first case of chain aligned to self):

# Aligned_sequences: 2
# 1: 1VS5R
# 2: 1VS5R
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 75
# Identity:      75/75 (100.0%)
# Similarity:    75/75 (100.0%)
# Gaps:           0/75 ( 0.0%)
# Score: 385.0

(5) Parse the alignment output to extract the PDB ID of sequence 1, sequence 2 where, for example:

Length = 100
Identity = 99/100
# therefore differ by 1 amino acid

ADD COMMENT • link 13.5 years ago by Neilfws 49k

0

Entering edit mode

Awesome answer and blog post. I'm curious, how many mutant pairs does this find?

ADD REPLY • link 13.5 years ago by Keith Callenberg ▴ 960

0

Entering edit mode

Added that info at end of blog post. There are 12 912 pairs of PDB chains that differ by 1 residue. Of those, 1 914 pairs differ due to one gap in the alignment; the other 10 998 are due to 1 amino acid change.

ADD REPLY • link 13.5 years ago by Neilfws 49k

0

Entering edit mode

Thanks neilfws for that nice post. Also for your blog post. Looks like a good start to approach the problem. I should have mentioned in my initial post that I'm specifically interested in structural perspective of the problem, i.e. what are the deviations between two structures due to a single aa exchange. Since seqres might not be equal to the sequence in the atom records, there might be more work involved.

ADD REPLY • link 13.5 years ago by Chris ★ 1.6k

0

Entering edit mode

True. In which case I guess you'd want to extract sequence from the PDB record (or something derived from it). So, parsing the whole PDB it is!

ADD REPLY • link 13.5 years ago by Neilfws 49k