I'm currently working with a data set of PDBs and I'm interested in the sizes of the residues (number of atom per residue). I realized the number of atoms -len(residue.child_list) - differed from residues in different proteins even though being the same residue. For example: Residue 'LEU' having 8 atoms in one protein but having 19 in another!
My guess is an error in the PDB or in the PDBParser(), nevertheless the differences are huge!
For example in the case of the molecule 3OQ2:
r = model['B'][88]
r1 = model['B'][15] # residue at chain B position 15
In [287]: r.resname
Out[287]: 'VAL'
In [288]: r1.resname
Out[288]: 'VAL'
But
In [274]: len(r.child_list)
Out[274]: 16
In [276]: len(r1.child_list)
Out[276]: 7
So even within a single molecule there's difference in the number of atoms. I'd like to know if this is normal biologically, or if there's something wrong. Thank you.
So high-resolution crystal structures are actually more accurate on the number of atoms or these hydrogen atoms are an erratic side effect? Thank you for your answer.
You can consider them 'more accurate' in the sense that they provide a clearer density that allows crystallographers to unambiguously determine the position of even small hydrogen atoms. The problem here is that apparently not all hydrogen atoms could be properly determined and so you have this discrepancy. Regardless, depending on your goal, you might or might not need to bother about them. You can add all hydrogen atoms with a generic force field and energy minimize the structure for example (use GROMACS), or use the servers at Molprobity and WHATIF for a similar purpose.
p.s. If the answer is satisfactory please do mark it as 'correct'.
Thank you again.