I'm attempting to observe local conformational changes by using principal component analysis between different crystal structures (similar to what is done for MD simulations, but this is between solved structures instead of simulated ones). I have the setup, but I don't quite understand why my eigenvalues are both so high and so low.
I am using 2 PDBs: 1mmi and 1jql
I remove rotation and translation by aligning (in pymol) 1jql chain A with 1mmi chain A. I'm only looking at the CA atoms for now, but I plan to incorporate multiple atoms later. I then generate a list of the x,y,z co-ordinates and generate a matrix using them. I calculate the covariance matrix, diagonalize, and derive eigenvalues and eigenvectures from this. Then plot the eigenvalues.
Here is the code:
import re
import numpy as np
from numpy import linalg as LA
import matplotlib.pyplot as plt
pdb_1='1mmi.pdb'
pdb_2='aligned_1jqlpdb'
coordinate_list_closed=[]
coordinate_list_open=[]
with open(pdb_1) as closed_conformation:
for lines in closed_conformation:
if re.search('ATOM\s+\d+\s+CA\s+A*[A-Z]{3}\s+A',lines.strip()) is not None:
coordinate_list_closed.append(float(lines.strip().split()[6]))
coordinate_list_closed.append(float(lines.strip().split()[7]))
coordinate_list_closed.append(float(lines.strip().split()[8]))
with open(pdb_2) as open_conformation:
for line in open_conformation:
if re.search('ATOM\s+\d+\s+CA\s+[A-Z]+\s+A',line.strip()) is not None:
coordinate_list_open.append(float(line.strip().split()[6]))
coordinate_list_open.append(float(line.strip().split()[7]))
coordinate_list_open.append(float(line.strip().split()[8]))
atom_array=np.stack((np.array(coordinate_list_closed),np.array(coordinate_list_open)),axis=1)
covariance_matrix=np.cov(atom_array)
eig_val,eig_vect=LA.eig(covariance_matrix)
plt.scatter(np.arange(len(eig_val)),eig_val)
plt.show()
The very first eigenvalue, regardless of atom, which array I start with in the data, co-ordinates, is always in the triple digit range. Which makes no sense. I am looking for conformational rearrangments in Angstroms...you can't have a change of 800A. So I don't know if I have the wrong setup, or am going about this the wrong way. Any help would greatly be appreciated!
I guess I'm a bit confused, because I thought that's exactly what you want to do. Such as in the case above, a protein of 366 amino acids, would give a covariance matrix of 1098x1098 (3Nx3N). You want to include all the atoms. You then apply that decomposition to this covariance matrix.