OPTION :1 For those wanting to do this where they just define the settings and let Python/PyMOL handle the unfolding a segment of your specified protein
Without you needing to even install anything.
In the course of working out how to do this process, I generalized it as well. It is in form that is largely automated once you set the settings necessary. It works for entries in the Protein Data Bank if you have the PDB id code and the chain designation. Go here and press the 'launch binder
' badge (or alternatively just click here to launch) and then when the Jupyter session comes up, choose from the available notebooks, 'Demo of Unfolding Region of Protein Chain via PyMOL'. Then step through running the notebook. I suggest step through first running it with the settings there to see how it works and then change the settings to what you want for the next run.
It should be adjustable with some hand editing to use custom, private/un-released PDB files uploaded to the session, too. (You can drag and drop from your local system into the file browser panel that appears in the left side of the JupyterLab interface in the session.)
Download the modified PDB file from the temporary session.
OPTION 2: For OP and those wanting to do this in PyMOL with a script they run like in original post
Here is the original poster's main code updated to incorporate my suggestions and additional modifications for running in PyMOL:
from Bio.PDB import PDBParser, PDBIO
import warnings
import Bio.PDB
import math
import pandas as pd
import pymol
from pymol import cmd
warnings.filterwarnings("ignore")
input_pdb_file = "4fdi.pdb"
output_pdb_file = "modified_protein.pdb"
selected_chain = "A"
mutation_residue_number = 250
parser = PDBParser()
structure = parser.get_structure("protein", input_pdb_file)
three_to_one = {
"ALA": "A", "ARG": "R", "ASN": "N", "ASP": "D", "CYS": "C",
"GLU": "E", "GLN": "Q", "GLY": "G", "HIS": "H", "ILE": "I",
"LEU": "L", "LYS": "K", "MET": "M", "PHE": "F", "PRO": "P",
"SER": "S", "THR": "T", "TRP": "W", "TYR": "Y", "VAL": "V",
}
def convert_to_one_letter(three_letter_code):
three_letter_code = three_letter_code
if three_letter_code in three_to_one:
return three_to_one[three_letter_code]
else:
return None
df = pd.DataFrame(columns=("Residue_ID", "Residue_Name"))
for model in structure:
for chain in model:
if chain.id == selected_chain:
for i, residue in enumerate(chain):
residue_id = residue.id[1]
if residue_id in range((mutation_residue_number - 10) - 1, (mutation_residue_number + 10) + 2):
residue_name = residue.resname
df.loc[len(df)] = residue_id, residue_name
cmd.reinitialize()
cmd.load(input_pdb_file)
region_to_unfold_tuples = []
region_to_unfold_selection_tuples = []
for index, row in df.iterrows():
region_to_unfold_selection_tuples.append((f"{row.iloc[1]}_{row.iloc[0]}",f"resn {row.iloc[1]} and resi {row.iloc[0]} and chain {selected_chain}"))
region_to_unfold_tuples.append((f"{row.iloc[1]}`{row.iloc[0]}",f"resn {row.iloc[1]} and resi {row.iloc[0]} and chain {selected_chain}"))
region_to_unfold_tuples = tuple(region_to_unfold_tuples)
region_to_unfold_selection_tuples = tuple(region_to_unfold_selection_tuples)
resNumLst = df.Residue_ID
sys.stderr.write("Modifying dihedral angles...\\n")
previousResidue, currentResidue, nextResidue = (None for i in range(3))
for i in range(1, len(region_to_unfold_tuples) - 1):
previousResidue = region_to_unfold_tuples[i-1][0]
currentResidue = region_to_unfold_tuples[i][0]
nextResidue = region_to_unfold_tuples[i+1][0]
cmd.set_dihedral(f"/{input_pdb_file[:-4]}//{selected_chain}/{previousResidue}/C", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/N", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/CA", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/C", -120, quiet=0)
cmd.set_dihedral(f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/N", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/CA", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/C", f"/{input_pdb_file[:-4]}//{selected_chain}/{nextResidue}/N", 140, quiet=0)
cmd.set_dihedral(f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/CA", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/C", f"/{input_pdb_file[:-4]}//{selected_chain}/{nextResidue}/N", f"/{input_pdb_file[:-4]}//{selected_chain}/{nextResidue}/CA", 180, quiet=0)
If you open 4fdi.pdb
in PyMOL and run that (after installing Pandas via pip
and restarting, if not already installed) & then restrict the view to just chain A, you get:
In the course of working it out, I generalized it as well. It is in form that is largely automated once you set the settings necessary, and you don't even need to install anything to use it. It works for entries in the Protein Data Bank if you have the PDB id code and the chain designation. Go here and press the 'launch binder
' badge (or alternatively just click here to launch) and then when the Jupyter session comes up, choose from the available notebooks, 'Demo of Unfolding Region of Protein Chain via PyMOL - involving Pandas'. Then step through running the notebook. I suggest step through first running it with the settings there to see how it works and then change the settings to what you want for the next run. It should be adjustable with some hand editing to use custom, private PDB files uploaded to the session, too.
I also added a version with Pandas and Biopython left out because I don't see the need and less dependencies will make it easier to run for those only having PyMOL. See the link to that in the list of available notebooks or within this post above or
(Link to static version of the executed notebooks: static notebook for original poster that uses Pandas and BioPython & static notebook for streamlined approach. I don't suggest using these because the environment is quite complex and I supply a place to launch a working session without installing anything here; however, I am supplying them so those with advanced familiarity can assess the code and results.)
I'm not following the basis for the following:
The reference you point to says the extended conformation is -120 degrees for phi and 140 degrees for psi and 180 degrees for omega.
So why do you have zero for the two(?) angle settings you used?
On a related, yet separate note concerning the other part of that command:
I'm not sure your method of specifying the atom selections as part of the dihedral command is correct. For example, with your code, you first phi angle change equates to this:
When I enter that, I get an error:
I see this working just fine though and it changes the structure:
Note that is very different than the equivalent of what you had.
Translating most of that to the end of what you had would be something like this:
I'll leave it for you to add the omega angle, too.
Despite selecting the angles (-120, 140, 180) as suggested, I'm still unable to achieve my desired outcome. However, after adjusting the angles to -180, 180, and 180, I've partially achieved my aim. One issue I'm encountering now is the replication of atoms in the selected residues.
By looking at the image below, you can observe that atoms are being duplicated as marked in the picture. This issue arises solely from the adjustment of the dihedral angles.
You created the duplicates with that line:
cmd.create()
is discussed here.cmd.select)
is discussed here.`If I run
cmd.create("TYR_240", "resn TYR and resi 240 and chain A")
in PyMol with 4fdi open it creates the residue as a separate object.I think you meant to make a selection with it?
Actually, I couldn't understand why you were doing that create step in the original code you posted?
object_list
is the only thing that gets made from it and you are only using that to count/track the residues. There's better ways to make a collection like that in Python to track than make a PyMol object.When I replied to you originally, I didn't look at the atoms you had specifying the dihedral angles. I was just catching the angles looked off. But now I looked at the atoms:
In your original post you had:
So that is the C(i-1), N(i), Ca(i), and C(i), which matches with that atoms I find specifying phi, too.
And now you have:
And that doesn't match with C(i-1), N(i), Ca(i), and C(i). ???
Also
In your original post you have:
That is C,C,N,Ca(i+1). That doesn't match anything I see for psi. ???
Then in your latest post you have for psi:
That's Ca,C,N,Ca (i+1), which doesn't match anything I see for psi either. ???
In the original question (main question), I initially created selection objects for each residue, which is unnecessary now. Instead, I directly modified the phi, psi, and omega angles since I had already extracted the residue ranges and saved it in a dataframe. It also resolved duplication of the atoms. Later, I realized that the atoms mentioned earlier were incorrect. The proper way of changing torsion angles is as follows, which I am currently utilizing:
phi: N(i-1), CA(i-1), C(i-1), N(i) || psi: CA(i), C(i), N(i+1), CA(i+1) || omega: C(i+1), N(i+2), CA(i+2), C(i+2)
I got the reference from the below image http://www.bioinf.org.uk/teaching/bioc0008/page03.htmlIf you look at the below image, you’ll see that the residues are experiencing clashes with neighboring residues due to the fixed angles (-180, 180, and 180) for phi, psi, and omega angles, respectively. Is there an alternative approach to linear the structure without significant clashes or with minimal clashes? I've also attempted energy minimization using the PyMOL "minimize" command
Before discussing clashes, I think it is imperative to get the correct angles and values for applying to 'linearize'.
Please cite and reference sources in some manner. A link will do. That is just a general good habit in science and important when discussing specifics such as this in these types of forums.
I don't believe those updated angle definitions are correct. I don't think you are reading that image correctly. You were closer originally in your first post.
I read psi from that image as contributed by atoms making up " the N(i),Ca(i),C(i),N(i+1)" dihedral angle, which agrees with the very top here on the Proteopedia page about Phi and Psi angles and also under the 'Proteins' heading at the wikipedia page on Dihedreal angles in stereochemistry.
Similarly, from the image you provide, I read the defining atoms of the Phi angle as what I suspect as well. (Technically, Phi cannot be read from that image you share because not all the atoms involved in defining it are even shown.) I see Phi as "the C(i-1),N(i),Ca(i),C(i) torsion angle" citing the same reference as for psi.
Unfortunately the Proteopedia page doesn't address omega. I'd interpret omega in your figure to be Ca(i), C'(i), N(i+1), Ca(i+1), with the last defining atom not even visible in your figure. I wish the wikipedia page indicated better what atoms belong to which residue in the text version they show. And further complicating things is that though they show an omega* angle in the figure on the right, as of now I think that is the omega angle for the previous residue, relative the other angles indicated I do note that the wikipedia page adds a caveat about the diagram used there:
I suspect it is in reference to that; however, I sure wish the wikipedia post had better specified. Omega define in the text at the top of this post here agrees with your image, and the defining atoms as Ca(i), C'(i), N(i+1), Ca(i+1) that I gathered from your image. Again, though this is at odds with what you have for omega.
I have already cited what I believe the values for the specific phi, psi, and omega angles should be as set in the Tien et al 2013 reference I cite above. The omega is the easiest to be sure of as planarity of the peptide bond usually restricts omega to be 180.