Hi all,
I would like to retrieve the sequence index position for an amino acid residue from a .cif
file using Biopython's PDB
package. My goal is to get the sequence index of spatially neighboring AAs using Neighborsearch
I have the following piece of code.
IN:
import Biopython.PDB as bpdb
structure = bpdb.MMCIFParser().get_structure('subt', 'file.cif')
for chain in structure[0]:
for res in chain:
for atom in res:
atoms.append(atom)
neighbor = bpdb.NeighborSearch(atoms)
coords = []
for idx, residue in enumerate(structure[0].get_residues()):
if bpdb.is_aa(residue):
name = residue.get_resname()
ca_atom = residue['CA']
coords.append((idx, residue, ca_atom.get_coord()))
occupancy = ca_atom.get_occupancy()
if 33 < idx < 40:
print(f'AA:{name}, idx:{idx}, get_id:{residue.get_id()[1]}')
OUT:
AA:ILE, idx:34, get_id:35
AA:SER, idx:35, get_id:36
AA:THR, idx:36, get_id:38
AA:HIS, idx:37, get_id:39
AA:PRO, idx:38, get_id:40
AA:ASP, idx:39, get_id:41
Now as you can see, the sequential index idx
gives the real primary structure position of the AA, but the index, retrieved with the residue.get_id()
function, sometimes jumps (as can be seen at idx:37
) and therefore seems to indicate the wrong position. The problem becomes clear when searching for spatial neighbors, since I would like to find the (correct) sequence index of those neighbors:
IN:
radius = 5
coord = coords[36][2]
neighbors = neighbor.search(coord, radius, level='R')
for item in neighbors:
print(item)
OUT:
<Residue SER het= resseq=36 icode= >
<Residue HOH het=W resseq=1135 icode= >
<Residue THR het= resseq=38 icode= >
But as you can see is is not possible to trace back the position of THR
(get_id:38
against idx:36
)
Why does it jump and how can I get the correct get_id()?
Hope someone can help me out,
Stefan