I am a former NMR spectroscopist. You should check the comments on the PDB file. If the authors don't call out a specific model as being the best representative of the family, then you can usually assume that it is the first one. The authors have access to the underlying constraint data that you do not have. In the lab I was trained in, we picked based on the model that best satisfied all of the constraints (i.e., lowest energy in the final minimization step) unless something really odd happened- and if something really odd happened, you'd definitely research it and put something about it in the comments when you submitted to the PDB. But other labs did different things. I don't know if the field has standardized on how to pick a model since I left it.
Regardless, the differences between models are usually pretty small. If you're down to the level of detail at which these differences matter, you should consider using all of the models, since they all satisfy the experimental constraints.
EDITED TO ADD: I missed the additional question in the follow up. Back in my day, we used custom software to find out how many structures were needed to adequately represent the structural space that is consistent with the experimental constraints. I don't know if there are publicly available tools to do this now, but XPLOR is really more of a crystallography tool, and they don't have this exact concern, so I'd be surprised if it does what you want.
Anyway, I think you need to decide whether the differences among the structures in the family matter or not for the question you are trying to answer. In most cases, just picking the first structure in the family will be sufficient. In cases where it isn't, then you probably need to find a way to include the entire family.
I dug up the relevant paragraph from one of my (ancient) papers to help you think about this (and to remind myself of what we did):
"A final ensemble of 24 structures was selected by first ordering the structures on the basis of increasing restraint violation energies. Structures that had a total AMBER energy or a specific term of the force field greater than two standard deviations above the mean were carefully scrutinized for potential exclusion from the final ensemble. The minimum number of structures required to adequately represent the conformational space allowed by the data was 22, as determined using the FINDFAM program (Smith 1999).The number of structures in the final ensemble was selected to be similar to that used for previous calbindin D9k structures to facilitate comparison."
(This is the paper: [?]http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2373453/?tool=pubmed[?]
If you want to be really thorough, the paper describing the structure you want to use should have a similar paragraph in the materials and methods, and that will help you figure out what to do.
The advantage of PCA plots is that you immediately see how big the difference between conformations compared to different proteins is. Why not just give it a try with all that you have got?
But how are you planning to apply PCA to protein structures?
Interesting idea... So you would take the estimate of the relative coordinates from the center of the protein for every atom (or the center of an amino acid?) for every conformation estimate and see whether specific variation (your principal components) occur between those conformations? Is that indeed what you plan to do>? That should show you for instance groups of conformations that share the same major conformation change (different curve) separated from the rest immediately.I have never seen anything like that, but I think the idea is nice. I would love to see the results.
Interesting idea... So you would take the estimate of the relative coordinates from the center of the protein for every atom (or the center of an amino acid?) for every conformation estimate and see whether specific variation (your principal components) occur between those conformations? Is that indeed what you plan to do? That should show you for instance groups of conformations that share the same major conformation change (different curve) separated from the rest immediately.I have never seen anything like that, but I think the idea is nice. I would love to see the results.
Interesting idea... So you would take the estimate of the relative coordinates from the center of the protein for every atom (or the center of an amino acid?) for every conformation estimate and see whether specific variation (your principal components) occur between those conformations? Is that indeed what you plan to do? That should show you for instance groups of conformations that share the same major conformation change (different curve) separated from the rest immediately.