Question

Retrieve all distinct domains from PDB (or lots of them)?

0

Entering edit mode

7.8 years ago

rayoub ▴ 110

I would like to understand how the PDB manages domains. My understanding is that the internal folding of a domain is independent of inter-domain contacts that may occur in final 3d protein structure. If this is the case, why are recurring domains across a variety of proteins repeated in every PDB file in which they occur rather than have some kind of non-redundant representation and reference it? Is there a way to obtain a non-redundant representation of the 3 dimensional structures of protein domains in terms of internal coordinates?

PDB Protein Domains • 2.5k views

ADD COMMENT • link updated 7.8 years ago by andreas.prlic ▴ 290 • written 7.8 years ago by rayoub ▴ 110

1

Entering edit mode

7.8 years ago

Petr Ponomarenko ★ 2.8k

PDB holds experimental structures. Technically PDB holds physical models made by authors from experimental electron density maps (Uppsala repository holds these for PDB structures http://eds.bmc.uu.se/eds/). Because of this, structures of the same domains in different dmodels are different. When you see several structures of the same protein in PDB chances are high that these structures were resolved with different protein sequence, small chemicals (ligands, drugs) and heavy metals. All of this affects experimental structures in PDB.

Your idea about having a database of non-redundant 3D structures for domains is interesting. There are some like this made for benchmarking of small molecule docking. But than the question is what are these structures are going to be used for? Because the way you optimize structures is going to be different for different tasks. Analysis of mutations is very different from drug design.

ADD COMMENT • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

What is the nature of the differences? My knowledge is somewhat elementary here, having only read intro textbooks that domains fold independently. Are they only slightly different per large molecular complex based on inter-chain contacts or binding to ligands? Or are the torsion angles all good but naturally the xyz coords are going to be different simply because of the positioning in the unit cell among different sorts of complexes? How about full chains, which may include multiple domains? Are those identical structures when the reoccur in different PDB macromolecular entities. Sorry for the deluge of questions, you did points in right direction and I will mark as correct regardless.

ADD REPLY • link 7.8 years ago by rayoub ▴ 110

1

Entering edit mode

As I said "When you see several structures of the same protein in PDB chances are high that these structures were resolved with different protein sequence, small chemicals (ligands, drugs) and heavy metals. All of this affects experimental structures in PDB."

The differences are small when you look on the local superposition of a few amino acids in a sequence, especially their Calpha's RMSD. But these small differences accumulate when you look at a few dozen amino acids, so when you superimpose domains the center will look very well aligned and outer parts will look similar but will be slightly moved one from the other as a group. This movement is a result of differences in sequence, ligands, heavy metals, and even the way programs were used to optimize protein structure in experimental electron density map. This difference is not a simple translocation of the coordinate origin point.

The best way to understand what is going on is to install software for protein superposition and viewing (I use ICM Browser by Molsoft). Then select protein you are interested in and find all pdb structures for it. Look up domains (for example in UniProt), superimpose by proteins, by chains, by domains and see how it is different and why.

ADD REPLY • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

score 2 · Accepted Answer · 2017-02-16

2

Entering edit mode

7.8 years ago

andreas.prlic ▴ 290

sounds like you need a structural classification. You could take a look at ECOD, SCOP, or CATH.

ADD COMMENT • link 7.8 years ago by andreas.prlic ▴ 290

0

Entering edit mode

I agree, I think understanding CATH and how it breaks down and clusters these units is the key. The only trouble is, I was hoping that where a domain occurs it will always have the same or close to the same torsion angles. Actually, I would be satisfied if all intra-domain contacts are always the same across molecular complexes. I'm not sure if that is the case or not. I will test that on a few examples from PDB but my general theoretical knowledge will still be lacking if I find they are equal since that may not always be the case.

ADD REPLY • link 7.8 years ago by rayoub ▴ 110

1

Entering edit mode

For Calpha atoms torsion angles will be a bit more preserved, but overall structures of domains are not the same even for the same protein in PDB, since there will be a difference in sequence, ligands, and software to make a 3d protein structure from electron density map.

ADD REPLY • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

That is very interesting. Then from what I can tell, if one were to mine PDB structures to predict contact or conformations then the best level to do this at is the level of the chain since domains within chains will have different conformations depending on the chain (sequence) they are taking part in.

But I fear it may not be as simple as that since ligands are not part of peptide chains per se and nonetheless have effect on the conformations of the chains it contacts? Hmmm.

There are many papers out there doing ML and stat technique to predict conformations from sequence and known structural data but I have not seen these considerations addressed and perhaps the mining of structural information is not as simple as it may seem at first. It seems what I have learned on this thread is a sequence without it context is not meaningful and Anfinsen's principal applies only to the full biological unit and not at domain or even chain level.

ADD REPLY • link 7.8 years ago by rayoub ▴ 110

1

Entering edit mode

You asked if the structures are different. They are different in PDB which is a repository of molecule models from crystallization and NMR experiments. PDB structures have changes in sequence, extra ligands added and sometimes heavy atoms to help the crystallization process.

You are now switching to questions of protein structure in a living organism where the sequence is "original" there are chaperones and other machinery to guide folding, there are no extra ligands added or heavy atoms because a living organism has no intention to crystallize the protein.

When you want to predict the function of protein, it's folding, its interactions with other proteins or drugs you want to do it for a "living organism" and PDB structure is only a starting point. In the end the structure of a protein there is shaking because it has relatively high temperature compared to the crystallized state. Moreover, some parts of the protein do not have stable structure at all and always moving around like =)

ADD REPLY • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Head exploding now. Your answer is very informative and the exploding is not due to misunderstanding but understanding it is more complicated than initially thought for a CS/MATH person to address in a vacuum. When CASP puts out a structure target for folks to try to predict because it will be soon crystallized I was under the impression they were predicting protein conformation with respect to newly submitted PDB entry. Perhaps that is the case and what you are saying is its biological relevance is limited or else it is the best that one could do in terms of structure prediction because conformations in solution are not so easily obtained and hence this sort of prediction in crystallization is only a guide to the molecular biologist in his investigation of the real structure and function of the entire complex. Sorry to ask such an extend questions but I think this is very close to a final summation of my misunderstanding on this point. Thanks again for your help it is very much appreciated.

ADD REPLY • link 7.8 years ago by rayoub ▴ 110

0

Entering edit mode

Could you please clarify your question?

ADD REPLY • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Rather than clarify my question I'm going to look at the ICM-Browser and do as you instruct below. If that is not free I'll try it with PyMOL. Haven't used either yet. I think you are right in that I've gone as far as I could with juggling the concepts in the abstracts and I need to put my head back down and see what I fine in the direction you indicated.

As an aside, I'm not sure what you mean by 'differences in sequence'. I thought a sequence is what defines a protein. I'm assuming you mean differences in the overall sequence a 'domain' is found in.

ADD REPLY • link 7.8 years ago by rayoub ▴ 110

1

Entering edit mode

I meant that protein sequences in PDB usually are truncated, mutated, have insertions and deletions compared to what you normally can see in a living organism you study (for example in RefSeq).

ADD REPLY • link 7.8 years ago by Petr Ponomarenko ★ 2.8k