Am I understanding PCA for protein analysis correctly?
1
0
Entering edit mode
22 months ago
Sam • 0

Hello,

This is more a conceptual question.

Let's say you want to visualize conformational changes between different crystal structures of the same protein on a per amino acid level. To do this, you decide to do PCA. So you need to do eigen decomposition on the covariance matrix.

For simplicity, let's say just 2 crystal structures and 2 residues. Each residue consists of 3-coordinates and thus you will have a 3Nx3N matrix (where N is the number of amino acids), resulting in 3N eigenvectors for each N. So in the case setup here, our covariance matrix will be a 6x6 with 6 eigenvectors. The x,y, and z co-ordinates for each residue can be given as such:

Protein 1
residue1=[20,50,8] 
residue2=[25,84,15]

Protein2
residue1=[21,49,8] 
residue2=[24,80,14]

So if these were to be represented as a matrix (whose covariance would be taken), it would be as such (a Mx3N matrix where M is the number of protein structures being compared, and N is the number of residues). In our case a 2x6 matrix:

 [20 21 #x of amino acid 1 for 2 proteins

 50 49 #y of amino acid 1 for 2 proteins

 8   8 #z of amino acid 1 for 2 proteins

 25 24 #x of amino acid 2 for 2 proteins 

 84 80 #y of amino acid 2 for 2 proteins

 15 14] #z of amino acid 2 for 2 proteins

The covariance matrix, to just break down one line, would be such:

["variance of x of the first residue between 2 structures"    "covariance of x and y of the first residue between 2 structures"    "covariance of x and z of the first residue between 2 structures"    "covariance of x and x between amino acid 1 and amino acid 2"    "covariance of x amino acid 1 and y of amino acid 2 for both structures"     "covariance of x amino acid 1 and z of amino acid 2 for both structures"]

This is the covariance matrix that then undergoes eigen decomposition to yield you eigenvalues and eigenvectors. You can see the covariance matrix will be a 6x6 (I wrote out what the first row would be).

If you look at the eigenvector of the highest eigenvalue, it will have 6 elements. From my understanding, the first 3 is the eigenvector for x,y,z for amino acid 1, and the next 3 for x,y,z for amino acid 2. The eigenvalue tells you how much the overall change for these 2 amino acids is, and the eigenvector values tell you individually, how much has each co-ordinate moved.

In this manner, one can determine how much has an individual amino acid changes between 2 crystal structures. Is my understanding of PCA correct?

Protein PCA Structures • 1.3k views
ADD COMMENT
0
Entering edit mode
22 months ago

I would suggest reading the following paper that talks about the use and very common misuse of PCA plots:

  • Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated, Nature 2022

https://www.nature.com/articles/s41598-022-14395-4

as you can see it there, in life sciences misusing and misinterpreting PCA plots is quite common and endemic.

The way I think of PCA is as a measure of whether ALL of the matrix's values can be decomposed into a "formulaic" and "simplified" manner where initial terms of the formula dominate and later terms are only small corrections.

Alternatively, when there is no structure in the matrix the terms in the formula will be equally important and have the same weight, in which case the formula is no simpler than the original matrix.

ADD COMMENT
0
Entering edit mode

The way I think of PCA is as a measure of whether ALL of the matrix's values can be decomposed into a "formulaic" and "simplified" manner where initial terms of the formula dominate and later terms are only small corrections.

I do believe this is how the eigenvector results should be interperted (and why you sort them via eigenvalues). But my question is more whether I am using this technique appropriately in this circumstance (i.e. how to use PCA for structural analysis)

ADD REPLY
0
Entering edit mode

Traditionally people use RMSD to do structural analysis,

The hallmark of protein structure comparison is the root mean square deviation (RMSD) between equivalent atom positions after the rigid modes of structural change have been removed. The RMSD defines an optimality criterion to determine the rotation and translation that best separate rigid-body from internal movements.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-363

I would do a literature search to see wether replacing RSMD with something else (as the paper above attempts) caught on. AFAIK simple concepts like RMSD are hard to dislodge.

As to the original question, I doubt you could find a direct interpretation for PCA that allows you to infer structure. Read the other paper that I linked, that talks about the problems in interpreting PCA distances for much simpler scenarios, let alone fine local structure. That paper shows how common it is to think that a PCA representation directly translates into real measures.

ADD REPLY
0
Entering edit mode

Thank you for the link, will definitely check the paper out. The problem with RMSD is referencing for alignment. Depending on what portion of the protein one aligns (or if they try to center and align the entire protein), different parts of the protein will have different RMSDs. The local changes will therefore be massively skewed by global changes. I am trying to find a way to observe local changes by ignoring global changes. One such as example I've been looking at is interaction maps (the interactions formed/disrupted will be irrelevant to alignment to the reference structure). I was thinking PCA would be another method that is irrelevant to alignment (alignment to a reference spectra is only to remove translation/rotational movement).

ADD REPLY

Login before adding your answer.

Traffic: 2279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6