I am recently doing a little bit work on protein structure, and have some problems. I want to download full proteins with crystal structure in PDB, then i need to sort out proteins with two or three physically closed AA (e.g. cysteines). So, my question is how to calculate or predict the physical distance of the given AA? Any tools can do it ?
A simple method is to extract all coordinate from PDB file for 'CA' atom and then calculate the euclidean distance between them.
If you use python here is simple code which can help you as starting point But I will not recommend to use it for analysis as it is not optimized and PDB files have many chains in which case it will give error
pdb_aa_cord = []
for i in open(pdb_file_name):
j=i.split()
if j[0]=='ATOM'and j[2]=='CA':
pdb_aa_cord.append((j[:5], float(j[6]), float(j[7]),float(j[8])))
for i,x in enumerate(pdb_aa_cord):
for y in pdb_aa_cord[i+1:]:
euc_dist = ((x[1]-y[1])**2 + (x[2]-y[2])**2 + (x[3]-y[3])**2)**0.5
print 'distance %s vs %s %f' %(' '.join(x[0]),' '.join(y[0]), euc_dist )
You can also do modifications for how you calculate your distances, depending on what you want to know. You can calculate the distance between the centre of mass of the side chains, distance of the nearest two atoms, etc. In my experience taking the side-chains in to account for measuring the distance gives more relevant results than the alpha-carbon distances. But either way you do it it is just a matter of calculating Euclidean distances in 3D space, and you have the 3D coordinates in the file. It's pretty straightforward.
You can also do modifications for how you calculate your distances, depending on what you want to know. You can calculate the distance between the centre of mass of the side chains, distance of the nearest two atoms, etc. In my experience taking the side-chains in to account for measuring the distance gives more relevant results than the alpha-carbon distances. But either way you do it it is just a matter of calculating Euclidean distances in 3D space, and you have the 3D coordinates in the file. It's pretty straightforward.
Is it possible to explain the above code a little bit?