I'm trying to extract some information from a set of PDB xml files. The difficulty I'm having is associating ATOMs with CHAINs. In the "flat-file" format the chain ID is on the same line as the ATOM information so its pretty easy. However, in the XML version it seems that this info is serpateed between the PDBx:pdbx_poly_seq_scheme
and PDBx:atom_site
. I'm having trouble making sure that I'm associating the correct atom_site
objects with the correct records in the poly_seq_objects
. I've included some examples from each .. these come from pdb:1BIS.
The simple question: What ids should I use to 'join' these records, its not obvious from the documentation.
atom_sites:
<PDBx:atom_site id="1015">
<PDBx:B_iso_or_equiv>24.26</PDBx:B_iso_or_equiv>
<PDBx:B_iso_or_equiv_esd xsi:nil="true" />
<PDBx:Cartn_x>-4.448</PDBx:Cartn_x>
<PDBx:Cartn_x_esd xsi:nil="true" />
<PDBx:Cartn_y>-14.262</PDBx:Cartn_y>
<PDBx:Cartn_y_esd xsi:nil="true" />
<PDBx:Cartn_z>4.417</PDBx:Cartn_z>
<PDBx:Cartn_z_esd xsi:nil="true" />
<PDBx:auth_asym_id>A</PDBx:auth_asym_id>
<PDBx:auth_atom_id>O</PDBx:auth_atom_id>
<PDBx:auth_comp_id>LEU</PDBx:auth_comp_id>
<PDBx:auth_seq_id>172</PDBx:auth_seq_id>
<PDBx:group_PDB>ATOM</PDBx:group_PDB>
<PDBx:label_alt_id></PDBx:label_alt_id>
<PDBx:label_asym_id>A</PDBx:label_asym_id>
<PDBx:label_atom_id>O</PDBx:label_atom_id>
<PDBx:label_comp_id>LEU</PDBx:label_comp_id>
<PDBx:label_entity_id>1</PDBx:label_entity_id>
<PDBx:label_seq_id>126</PDBx:label_seq_id>
<PDBx:occupancy>1.00</PDBx:occupancy>
<PDBx:occupancy_esd xsi:nil="true" />
<PDBx:pdbx_PDB_ins_code xsi:nil="true" />
<PDBx:pdbx_PDB_model_num>1</PDBx:pdbx_PDB_model_num>
<PDBx:pdbx_formal_charge xsi:nil="true" />
<PDBx:type_symbol>O</PDBx:type_symbol>
</PDBx:atom_site>
<PDBx:atom_site id="1016">
<PDBx:B_iso_or_equiv>25.89</PDBx:B_iso_or_equiv>
<PDBx:B_iso_or_equiv_esd xsi:nil="true" />
<PDBx:Cartn_x>-3.267</PDBx:Cartn_x>
<PDBx:Cartn_x_esd xsi:nil="true" />
<PDBx:Cartn_y>-16.870</PDBx:Cartn_y>
<PDBx:Cartn_y_esd xsi:nil="true" />
<PDBx:Cartn_z>6.060</PDBx:Cartn_z>
<PDBx:Cartn_z_esd xsi:nil="true" />
<PDBx:auth_asym_id>A</PDBx:auth_asym_id>
<PDBx:auth_atom_id>CB</PDBx:auth_atom_id>
<PDBx:auth_comp_id>LEU</PDBx:auth_comp_id>
<PDBx:auth_seq_id>172</PDBx:auth_seq_id>
<PDBx:group_PDB>ATOM</PDBx:group_PDB>
<PDBx:label_alt_id></PDBx:label_alt_id>
<PDBx:label_asym_id>A</PDBx:label_asym_id>
<PDBx:label_atom_id>CB</PDBx:label_atom_id>
<PDBx:label_comp_id>LEU</PDBx:label_comp_id>
<PDBx:label_entity_id>1</PDBx:label_entity_id>
<PDBx:label_seq_id>126</PDBx:label_seq_id>
<PDBx:occupancy>1.00</PDBx:occupancy>
<PDBx:occupancy_esd xsi:nil="true" />
<PDBx:pdbx_PDB_ins_code xsi:nil="true" />
<PDBx:pdbx_PDB_model_num>1</PDBx:pdbx_PDB_model_num>
<PDBx:pdbx_formal_charge xsi:nil="true" />
<PDBx:type_symbol>C</PDBx:type_symbol>
</PDBx:atom_site>
poly_seqs
<PDBx:pdbx_poly_seq_scheme asym_id="A" entity_id="1" mon_id="SER" seq_id="11">
<PDBx:auth_mon_id>SER</PDBx:auth_mon_id>
<PDBx:auth_seq_num>57</PDBx:auth_seq_num>
<PDBx:hetero>n</PDBx:hetero>
<PDBx:ndb_seq_num>11</PDBx:ndb_seq_num>
<PDBx:pdb_ins_code></PDBx:pdb_ins_code>
<PDBx:pdb_mon_id>SER</PDBx:pdb_mon_id>
<PDBx:pdb_seq_num>57</PDBx:pdb_seq_num>
<PDBx:pdb_strand_id>A</PDBx:pdb_strand_id>
</PDBx:pdbx_poly_seq_scheme>
<PDBx:pdbx_poly_seq_scheme asym_id="A" entity_id="1" mon_id="PRO" seq_id="12">
<PDBx:auth_mon_id>PRO</PDBx:auth_mon_id>
<PDBx:auth_seq_num>58</PDBx:auth_seq_num>
<PDBx:hetero>n</PDBx:hetero>
<PDBx:ndb_seq_num>12</PDBx:ndb_seq_num>
<PDBx:pdb_ins_code></PDBx:pdb_ins_code>
<PDBx:pdb_mon_id>PRO</PDBx:pdb_mon_id>
<PDBx:pdb_seq_num>58</PDBx:pdb_seq_num>
<PDBx:pdb_strand_id>A</PDBx:pdb_strand_id>
</PDBx:pdbx_poly_seq_scheme>
<PDBx:pdbx_poly_seq_scheme asym_id="A" entity_id="1" mon_id="GLY" seq_id="13">
<PDBx:auth_mon_id>GLY</PDBx:auth_mon_id>
<PDBx:auth_seq_num>59</PDBx:auth_seq_num>
<PDBx:hetero>n</PDBx:hetero>
<PDBx:ndb_seq_num>13</PDBx:ndb_seq_num>
<PDBx:pdb_ins_code></PDBx:pdb_ins_code>
<PDBx:pdb_mon_id>GLY</PDBx:pdb_mon_id>
<PDBx:pdb_seq_num>59</PDBx:pdb_seq_num>
<PDBx:pdb_strand_id>A</PDBx:pdb_strand_id>
If you're asking why I don't just use the flat-files ... its because I have some other structures which have only been generated in the XML format and cannot be regenerated in the flat-file format.
Thanks a bunch, Will
How exactly are you parsing this?
Taking the flat file entry for atom site 1015:
I will freely admit that I know nothing about PDB files, but is the 'A' in this entry a reference to the chain? So isn't this the same info that is in [?]A[?] in the atom_sites? It seems to represent the same information. I may have completely missed your point however, so feel free to point it out if I have..
no, it does not seem to be the chain id since when I visually scan through records with multiple chains the
label_asym_id
doesn't change.I'm parsing this with Python's xml library. I'm not having any trouble getting the info out, just linking these two pieces of information.
and I've noticed that the
atom_sites
andpoly_schemes
are not in the same order so I can't just match base on residue identitiesif its not clear what I'm asking for then leave a comment and I'll try to clear it up :)
Is there a 1-to-1 match between ATOMs and CHAINs? Can you provide a link to complete XML files?
When you say "in flat-file format the chain ID is on the same line as the ATOM" specifically how does the data look? For example http://www.pdb.org/pdb/files/1BIS.pdb the atom with id=1015 the chain ID would be simply "A"?