Question

Syntax For Neighborsearch Module In Biopython

2

Entering edit mode

14.4 years ago

satsurae ▴ 120

Hi,

I'm having trouble with using this module, more specifically the 'search' function. How do i get the atom list from a pdb file (is this using PDBParser)? How do I use it to find the nearest residues from the co-crystallized ligand?

Cheers

biopython pdb parsing • 8.3k views

ADD COMMENT • link updated 6.9 years ago by jma.bullock • 0 • written 14.4 years ago by satsurae ▴ 120

1

Entering edit mode

Can you show us the code you already have got?

ADD REPLY • link 14.4 years ago by Michael Schubert ★ 7.1k

Ram · Answer 1 · 2010-07-22

5

Entering edit mode

14.4 years ago

Michael Schubert ★ 7.1k

The basic syntax of NeighborSearch is the following. If you tell me what you want to do exactly I can give you a more precise hint. Also, see the Bio.PDB FAQ (pdf).

The script basically loads the local pdb structure file molecule.pdb, creates a list of all atoms, gets the coordinates of the first atom, and prints a list of all resides within 5 angstrom of it (source).

from Bio.PDB import *
import sys
structure = PDBParser().get_structure('X', 'molecule.pdb') # load your molecule
atom_list = Selection.unfold_entities(structure, 'A') # A for atoms
ns = NeighborSearch(atom_list)
center = atom_list[0].get_coord()
neighbors = ns.search(center, 5.0) # 5.0 for distance in angstrom
residue_list = Selection.unfold_entities(neighbor_list, 'R') # R for residues
print residue_list

If you are interested in the neighbors of just a few atoms/residues, you might want to consider PyMOL since it has a nice set of GUI features that are more easily accessible.

ADD COMMENT • link updated 6.2 years ago by Ram 44k • written 14.4 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

Thanks very much for the reply.

Basically, what I want to do is find the nearest neighbors to the ligand bound in a protein. I need to do this for quite a few (over 200 pdbs) hence I wanted to use a script.

So in this test case, the ligand has a resseq=600 and the chain starts at resseq=307. Does this mean in center = atom_list[0].get_coord() the 0 would be 293?

Thanks again

ADD REPLY • link updated 6.2 years ago by Ram 44k • written 14.4 years ago by satsurae ▴ 120

0

Entering edit mode

No, because in one case you have a list of atoms and in the other residue IDs. In the above example, you could get all residues (instead of atoms) first, search for your ligand there, and then get the corresponding ligand atoms and perform NeighborSearch.

ADD REPLY • link 14.4 years ago by Michael Schubert ★ 7.1k

0

Entering edit mode

Hi I have large number of hetero-dimeric proteins. I need to check for all atoms of chain A for which atoms of chain B are within 10Å and need to obtain the list of residue numbers of those atoms as output. I am new to programming and I do not know how to write the code for this using the NeighborSearch module. Would you be able to help me out?

ADD REPLY • link 8.7 years ago by ramachandranakash1994 • 0

score 0 · Answer 2 · 2018-01-03

here is a script that calculates the fnat using modeller - and it is a bit more straightforward to tease apart the ligand and receptor (it should all work automatically, no messy renumbering stages). it is however quite slow, but maybe you will find it useful ... !

from modeller import *
from modeller.scripts import complete_pdb
import sys,os
import math
import numpy as np

def calculate_fnat(model,cut_off):
    fnat = {}
    for a in model.chains:
        for b in model.chains:
            if a != b:
                sel_a = selection(a)
                sel_b = selection(b)
                for ca_a in sel_a:
                    ax = ca_a.x
                    ay = ca_a.y
                    az = ca_a.z
                    for ca_b in sel_b:
                        bx = ca_b.x
                        by = ca_b.y
                        bz = ca_b.z

                        dist = math.sqrt(((ax-bx)**2)+((ay-by)**2)+((az-bz)**2))

                        if dist <= cut_off:
                            a_res  =  ca_a.residue
                            b_res  =  ca_b.residue
                            if (int(b_res._num)+1,b.name,int(a_res._num)+1, a.name) not in fnat:
                                if (int(a_res._num)+1,a.name,int(b_res._num)+1, b.name) not in fnat:
                                    fnat[int(a_res._num)+1,a.name,int(b_res._num)+1, b.name] = dist 
    return fnat

# initialise the environment .
env = environ()
env.libs.topology.read('${LIB}/top_heav.lib')
env.libs.parameters.read('${LIB}/par.lib')

#calculate native contacts
m1 = complete_pdb(env, 'native.pdb')
rmsd_sel = selection(m1).only_atom_types('CA')

fnat = calculate_fnat(m1,5)

filenames = []

for pdb_file in [l for l in os.listdir("./") if l.endswith(".pdb")]:
    print pdb_file
    filenames.append(pdb_file)

results = []    
# calculate model contacts                      
for fname in filenames:
    m2 = complete_pdb(env, fname)

    fnat_mod = calculate_fnat(m2,5)   

    count = float(0)

    #compare native contacts to model contacts
    for f in fnat:
        if f in fnat_mod:
            count += 1

    score = count/len(fnat)

    results.append([fname,score])

#output results
f1 = open('data_4.txt','w')
f1.write('fname\tfnat\n')    

for line in results:
    f1.write('%s\t%s\n' % (line[0],line[1]))

f1.close