Hello, I am trying to find the way how to calculate all atom distances from pdb file ? CAn anybody help me. I have a txt files that look like this :
ATOM 1279 C ALA 81 -1.925 -11.270 1.404
ATOM 1280 O ALA 81 -0.279 9.355 15.557
ATOM 1281 OXT ALA 81 -2.188 10.341 15.346
HETATM 1282 N THR 82 29.632 5.205 5.525
HETATM 1283 H1 THR 82 30.175 4.389 5.768
HETATM 1284 H2 THR 82 28.816 4.910 5.008
I wrote a script for calculating distances but still I am having some problem, it doesnt print any distances. I was trying to understand why but I am not sure sinse doesnt give me any error.
#!/usr/local/bin/perl
use strict;
use warnings;
open(IN, $ARGV[0]) or die "$!";
my (@refer, @points);
my $part = 0;
my $dist;
while (my $line = <IN>) {
chomp($line);
if ($line =~ /^HETATM/) {
$part++;
next;
}
my @array = (substr($line, 30, 8),substr($line,38,8),substr($line,46,8));
# print "@array\n";
if ($part == 0) {
push @refer, [ @array ];
} elsif ($part ==1){
push @points, [ @array ];
}
}
foreach my $ref(@refer) {
my ($x1, $y1, $z1) = @{$ref};
foreach my $atom(@points) {
my ($x, $y, $z) = @{$atom};
my $dist = sqrt( ($x-$x1)**2 + ($y-$y1)**2 + ($z-$z1)**2 );
print $dist;
}
}
This looks like homework, what have you tried?
It's some kind of personal homework, not for school but form me to make some dataset. So what I have done so far is wrote a script in perl that from the whole pdb dataset I kept only those pdb files that have ligand in its binding pocket (I included some cut off for number of heavy atoms to exclude ions and solvents, so ended up with pdb files that have ligands). I also extracted from each file protein and ligand coordinates into another txt file. So currently I am having a files that look like this: for example file. 2KRJ.txt:
The values for these coordinates are not the same of course. So I also found the formula fo calculating distances, but I am not sure how to calculate it: I want each atom (x,y,z coordinates) from protein (that starts with
^ATOM
) to be calculated with each atom from ligand (that starts with^HETATM
). If the number of each calculatation is more then 5 (I should assigned some cut off) then move the file or something like that.This is the formula: distance =
sqrt(($Ax - $Bx)**2 + ($Ay - $By)**2 + ($Az - $Bz)**2)
I am not that experienced in programming so trying to figure out how to approach to this problem. Thanks
So basically I am trying to calculate
ATOM(1)
with allHETATM(1,2,3,4)
and the same for each atom of the protein. if I end up with result that is <= to five then I should keep this file. Overall from all files I am trying to keep only those in which the distance between ligand and some protein atoms is less or equal to 5 Armstrongs.Ok, that is totally feasible in Perl with two nested loops or maps, but why don't you use R on the file you made? read the file using
atoms <- read.delim(my.file, sep="\t", header=T, row.names=1) ; dist(atoms);
that what be about 1/10 of the code required in Perl.