How can I linearize certain residues within an existing protein structure to create a dumbbell-shaped structure?
1
0
Entering edit mode
11 months ago

Hello Everyone,

I am trying to linearize certain residues of the protein (PDB ID: 4FDI) to achieve a dumbbell-shaped structure, for example, residues 250 to 270. I attempted to modify the residue’s angle by adjusting the phi and psi angles and rotating the residue's axes to model a dumbbell-shaped structure. I referred to a guide from this question (How to "linearize" or "unfold" a PDB format structure to secondary structure?) for assistance. Since I'm not proficient with Chimera, I am using the PyMOL API instead.

from Bio.PDB import PDBParser, PDBIO
import warnings
import Bio.PDB
import math
import pandas as pd
import pymol
from pymol import cmd
warnings.filterwarnings("ignore")

# Specify file paths
input_pdb_file = "4fdi.pdb"
output_pdb_file = "modified_protein.pdb"
selected_chain = "A"
mutation_residue_number = 250

# Parse the PDB file
parser = PDBParser()
structure = parser.get_structure("protein", input_pdb_file)

# Function to convert three-letter codes to one-letter codes
three_to_one = {
    "ALA": "A", "ARG": "R", "ASN": "N", "ASP": "D", "CYS": "C",
    "GLU": "E", "GLN": "Q", "GLY": "G", "HIS": "H", "ILE": "I",
    "LEU": "L", "LYS": "K", "MET": "M", "PHE": "F", "PRO": "P",
    "SER": "S", "THR": "T", "TRP": "W", "TYR": "Y", "VAL": "V",
}

def convert_to_one_letter(three_letter_code):
    three_letter_code = three_letter_code
    if three_letter_code in three_to_one:
        return three_to_one[three_letter_code]
    else:
        return None

# Iterate through residues
df = pd.DataFrame(columns=("Residue_ID", "Residue_Name"))
for model in structure:
    for chain in model:
        if chain.id == selected_chain:
            for i, residue in enumerate(chain):
                residue_id = residue.id[1]
                if residue_id in range((mutation_residue_number - 10), (mutation_residue_number + 10)):
                    residue_name = residue.resname
                    df.loc[len(df)] = residue_id, residue_name
# print(df)
cmd.reinitialize()
cmd.load(input_pdb_file)

# Make Selection object on the specify chain 
for index, row in df.iterrows():
    cmd.create(f"{row[1]}_{row[0]}", f"resn {row[1]} and resi {row[0]} and chain A")

object_list = cmd.get_object_list()[1:]
resLst = df.Residue_ID

preObjVal, currObjVal, nextobjVal, preResVal, currResVal, nextResVal = (None for i in range(6))
for i in range(1, len(object_list) -1):
    preObjVal, currObjVal, nextobjVal = object_list[i - 1:i + 2]
    preResVal, currResVal, nextResVal = resLst[i - 1:i + 2]

    cmd.set_dihedral(f"{preObjVal}///{preResVal}/C", f"{currObjVal}///{currResVal}/N", f"{currObjVal}///{currResVal}/CA", f"{currObjVal}///{currResVal}/C", 0) # phi
    cmd.set_dihedral(f"{currObjVal}///{currResVal}/C", f"{currObjVal}///{currResVal}/C", f"{currObjVal}///{currResVal}/N", f"{nextobjVal}///{nextResVal}/CA", 0) # psi

I got the below output:

pymol python biopython structural-bioinformatics • 1.8k views
ADD COMMENT
0
Entering edit mode

I'm not following the basis for the following:

cmd.set_dihedral(f"{preObjVal}///{preResVal}/C", f"{currObjVal}///{currResVal}/N", f"{currObjVal}///{currResVal}/CA", f"{currObjVal}///{currResVal}/C", 0) # phi
cmd.set_dihedral(f"{currObjVal}///{currResVal}/C", f"{currObjVal}///{currResVal}/C", f"{currObjVal}///{currResVal}/N", f"{nextobjVal}///{nextResVal}/CA", 0) # psi

The reference you point to says the extended conformation is -120 degrees for phi and 140 degrees for psi and 180 degrees for omega.

"We set the default for the backbone dihedral angles to the extended conformation ( phi = -120 degrees, psi = 140 degrees, omega = 180 degrees)." - SOURCE: 'PeptideBuilder: A simple Python library to generate model peptides'. Tien, Sydykova, Meyer, and Wilke. 2013. PeerJ

So why do you have zero for the two(?) angle settings you used?


On a related, yet separate note concerning the other part of that command:

I'm not sure your method of specifying the atom selections as part of the dihedral command is correct. For example, with your code, you first phi angle change equates to this:

cmd.set_dihedral('TYR_240///240/C', 'ALA_241///241/N', 'ALA_241///241/CA', 'ALA_241///241/C', 0)

When I enter that, I get an error:

Selector-Error: Invalid selection name "TYR_240".

I see this working just fine though and it changes the structure:

cmd.set_dihedral('/4fdi//A/TYR`240/C', '/4fdi//A/ALA`241/N', '/4fdi//A/ALA`241/CA', '/4fdi//A/ALA`241/C', 0)

Note that is very different than the equivalent of what you had.



Translating most of that to the end of what you had would be something like this:

# Make Selection object on the specify chain 
for index, row in df.iterrows():
    cmd.create(f"{row[1]}_{row[0]}", f"resn {row[1]} and resi {row[0]} and chain A")

object_list = cmd.get_object_list()[1:]
resLst = df.Residue_ID

preObjVal, currObjVal, nextobjVal, preResVal, currResVal, nextResVal = (None for i in range(6))
for i in range(1, len(object_list) -1):
    preObjVal, currObjVal, nextobjVal = object_list[i - 1:i + 2]
    #print(preObjVal)
    preResVal, currResVal, nextResVal = resLst[i - 1:i + 2]
    #print(preResVal)
    preObjVal = preObjVal.replace("_","`")
    currObjVal = currObjVal.replace("_","`")
    nextobjVal = nextobjVal.replace("_","`")

    cmd.set_dihedral(f"/4fdi//A/{preObjVal}/C", f"/4fdi//A/{currObjVal}/N", f"/4fdi//A/{currObjVal}/CA", f"/4fdi//A/{currObjVal}/C", -120) # phi
    cmd.set_dihedral(f"/4fdi//A/{currObjVal}/C", f"/4fdi//A/{currObjVal}/C", f"/4fdi//A/{currObjVal}/N", f"/4fdi//A/{nextobjVal}/CA", 140)# psi

I'll leave it for you to add the omega angle, too.

ADD REPLY
0
Entering edit mode

Despite selecting the angles (-120, 140, 180) as suggested, I'm still unable to achieve my desired outcome. However, after adjusting the angles to -180, 180, and 180, I've partially achieved my aim. One issue I'm encountering now is the replication of atoms in the selected residues.

preObjVal, currObjVal, nextobjVal, nextTonextobjVal, preResVal, currResVal, nextResVal, nextTonextResVal = (None for i in range(8))
for i in range(1, len(object_list), 4):
    preObjVal, currObjVal, nextobjVal, nextTonextobjVal = object_list[i - 1:i + 3]
    preResVal, currResVal, nextResVal, nextTonextResVal = resLst[i - 1:i + 3]

    # print(preResVal, currResVal, nextResVal, nextTonextResVal)

    preObjVal = preObjVal.replace("_","`")
    currObjVal = currObjVal.replace("_","`")
    nextobjVal = nextobjVal.replace("_","`")
    nextTonextobjVal = nextTonextobjVal.replace("_","`")

    cmd.set_dihedral(f"/4fdi//A/{preObjVal}/N", f"/4fdi//A/{preObjVal}/CA", f"/4fdi//A/{preObjVal}/C", f"/4fdi//A/{currObjVal}/N", -180, quiet=0) # phi
    cmd.set_dihedral(f"/4fdi//A/{currObjVal}/CA", f"/4fdi//A/{currObjVal}/C", f"/4fdi//A/{nextobjVal}/N", f"/4fdi//A/{nextobjVal}/CA", 180, quiet=0) # psi
    cmd.set_dihedral(f"/4fdi//A/{nextobjVal}/C", f"/4fdi//A/{nextTonextobjVal}/N", f"/4fdi//A/{nextTonextobjVal}/CA", f"/4fdi//A/{nextTonextobjVal}/C", 180, quiet=0) # omega

enter image description here

 

By looking at the image below, you can observe that atoms are being duplicated as marked in the picture. This issue arises solely from the adjustment of the dihedral angles. enter image description here

ADD REPLY
0
Entering edit mode

You created the duplicates with that line:

cmd.create(f"{row[1]}_{row[0]}", f"resn {row[1]} and resi {row[0]} and chain A")

cmd.create() is discussed here.
cmd.select) is discussed here.`

If I run cmd.create("TYR_240", "resn TYR and resi 240 and chain A") in PyMol with 4fdi open it creates the residue as a separate object.
I think you meant to make a selection with it?

Actually, I couldn't understand why you were doing that create step in the original code you posted? object_list is the only thing that gets made from it and you are only using that to count/track the residues. There's better ways to make a collection like that in Python to track than make a PyMol object.

ADD REPLY
0
Entering edit mode

When I replied to you originally, I didn't look at the atoms you had specifying the dihedral angles. I was just catching the angles looked off. But now I looked at the atoms:

In your original post you had:

cmd.set_dihedral(f"{preObjVal}///{preResVal}/C", f"{currObjVal}///{currResVal}/N", f"{currObjVal}///{currResVal}/CA", f"{currObjVal}///{currResVal}/C", 0) # phi

So that is the C(i-1), N(i), Ca(i), and C(i), which matches with that atoms I find specifying phi, too.

And now you have:

cmd.set_dihedral(f"/4fdi//A/{preObjVal}/N", f"/4fdi//A/{preObjVal}/CA", f"/4fdi//A/{preObjVal}/C", f"/4fdi//A/{currObjVal}/N", -180, quiet=0) # phi

And that doesn't match with C(i-1), N(i), Ca(i), and C(i). ???

Also

In your original post you have:

cmd.set_dihedral(f"/4fdi//A/{currObjVal}/C", f"/4fdi//A/{currObjVal}/C", f"/4fdi//A/{currObjVal}/N", f"/4fdi//A/{nextobjVal}/CA", 140)# psi

That is C,C,N,Ca(i+1). That doesn't match anything I see for psi. ???

Then in your latest post you have for psi:

cmd.set_dihedral(f"/4fdi//A/{currObjVal}/CA", f"/4fdi//A/{currObjVal}/C", f"/4fdi//A/{nextobjVal}/N", f"/4fdi//A/{nextobjVal}/CA", 180, quiet=0) # psi

That's Ca,C,N,Ca (i+1), which doesn't match anything I see for psi either. ???

ADD REPLY
0
Entering edit mode

In the original question (main question), I initially created selection objects for each residue, which is unnecessary now. Instead, I directly modified the phi, psi, and omega angles since I had already extracted the residue ranges and saved it in a dataframe. It also resolved duplication of the atoms. Later, I realized that the atoms mentioned earlier were incorrect. The proper way of changing torsion angles is as follows, which I am currently utilizing: phi: N(i-1), CA(i-1), C(i-1), N(i) || psi: CA(i), C(i), N(i+1), CA(i+1) || omega: C(i+1), N(i+2), CA(i+2), C(i+2) I got the reference from the below image http://www.bioinf.org.uk/teaching/bioc0008/page03.html

If you look at the below image, you’ll see that the residues are experiencing clashes with neighboring residues due to the fixed angles (-180, 180, and 180) for phi, psi, and omega angles, respectively. Is there an alternative approach to linear the structure without significant clashes or with minimal clashes? I've also attempted energy minimization using the PyMOL "minimize" command

cmd.reinitialize()
cmd.load(input_pdb_file)

preObjVal, currObjVal, nextobjVal, nextTonextobjVal, preResVal, currResVal, nextResVal, nextTonextResVal = (None for i in range(8))
for i in range(1, len(df), 4):
    preObjVal, currObjVal, nextobjVal, nextTonextobjVal = (f"{df.Residue_Name[j]}`{df.Residue_ID[j]}" for j in range(i-1, i+3))

    cmd.set_dihedral(f"/4fdi//A/{preObjVal}/N", f"/4fdi//A/{preObjVal}/CA", f"/4fdi//A/{preObjVal}/C", f"/4fdi//A/{currObjVal}/N", -180, quiet=0) # phi
    cmd.set_dihedral(f"/4fdi//A/{currObjVal}/CA", f"/4fdi//A/{currObjVal}/C", f"/4fdi//A/{nextobjVal}/N", f"/4fdi//A/{nextobjVal}/CA", 180, quiet=0) # psi
    cmd.set_dihedral(f"/4fdi//A/{nextobjVal}/C", f"/4fdi//A/{nextTonextobjVal}/N", f"/4fdi//A/{nextTonextobjVal}/CA", f"/4fdi//A/{nextTonextobjVal}/C", 180, quiet=0) # omega

cmd.rebuild()
cmd.save(output_pdb_file, state=-1, format='pdb')
ADD REPLY
0
Entering edit mode

Before discussing clashes, I think it is imperative to get the correct angles and values for applying to 'linearize'.

"I got the reference from the below image"

Please cite and reference sources in some manner. A link will do. That is just a general good habit in science and important when discussing specifics such as this in these types of forums.

"which I am currently utilizing: phi: N(i-1), CA(i-1), C(i-1), N(i) || psi: CA(i), C(i), N(i+1), CA(i+1) || omega: C(i+1), N(i+2), CA(i+2), C(i+2)"

I don't believe those updated angle definitions are correct. I don't think you are reading that image correctly. You were closer originally in your first post.
I read psi from that image as contributed by atoms making up " the N(i),Ca(i),C(i),N(i+1)" dihedral angle, which agrees with the very top here on the Proteopedia page about Phi and Psi angles and also under the 'Proteins' heading at the wikipedia page on Dihedreal angles in stereochemistry.

Similarly, from the image you provide, I read the defining atoms of the Phi angle as what I suspect as well. (Technically, Phi cannot be read from that image you share because not all the atoms involved in defining it are even shown.) I see Phi as "the C(i-1),N(i),Ca(i),C(i) torsion angle" citing the same reference as for psi.

Unfortunately the Proteopedia page doesn't address omega. I'd interpret omega in your figure to be Ca(i), C'(i), N(i+1), Ca(i+1), with the last defining atom not even visible in your figure. I wish the wikipedia page indicated better what atoms belong to which residue in the text version they show. And further complicating things is that though they show an omega* angle in the figure on the right, as of now I think that is the omega angle for the previous residue, relative the other angles indicated I do note that the wikipedia page adds a caveat about the diagram used there:

"The figure at right illustrates the location of each of these angles (but it does not show correctly the way they are defined)."

I suspect it is in reference to that; however, I sure wish the wikipedia post had better specified. Omega define in the text at the top of this post here agrees with your image, and the defining atoms as Ca(i), C'(i), N(i+1), Ca(i+1) that I gathered from your image. Again, though this is at odds with what you have for omega.

I have already cited what I believe the values for the specific phi, psi, and omega angles should be as set in the Tien et al 2013 reference I cite above. The omega is the easiest to be sure of as planarity of the peptide bond usually restricts omega to be 180.

ADD REPLY
2
Entering edit mode
10 months ago
Wayne ★ 2.1k

OPTION :1 For those wanting to do this where they just define the settings and let Python/PyMOL handle the unfolding a segment of your specified protein

Without you needing to even install anything.

In the course of working out how to do this process, I generalized it as well. It is in form that is largely automated once you set the settings necessary. It works for entries in the Protein Data Bank if you have the PDB id code and the chain designation. Go here and press the 'launch binder' badge (or alternatively just click here to launch) and then when the Jupyter session comes up, choose from the available notebooks, 'Demo of Unfolding Region of Protein Chain via PyMOL'. Then step through running the notebook. I suggest step through first running it with the settings there to see how it works and then change the settings to what you want for the next run.
It should be adjustable with some hand editing to use custom, private/un-released PDB files uploaded to the session, too. (You can drag and drop from your local system into the file browser panel that appears in the left side of the JupyterLab interface in the session.) Download the modified PDB file from the temporary session.


OPTION 2: For OP and those wanting to do this in PyMOL with a script they run like in original post

Here is the original poster's main code updated to incorporate my suggestions and additional modifications for running in PyMOL:

from Bio.PDB import PDBParser, PDBIO
import warnings
import Bio.PDB
import math
import pandas as pd
import pymol
from pymol import cmd
warnings.filterwarnings("ignore")

# Specify file paths
input_pdb_file = "4fdi.pdb"
output_pdb_file = "modified_protein.pdb"
selected_chain = "A"
mutation_residue_number = 250

# Parse the PDB file
parser = PDBParser()
structure = parser.get_structure("protein", input_pdb_file)

# Function to convert three-letter codes to one-letter codes
three_to_one = {
    "ALA": "A", "ARG": "R", "ASN": "N", "ASP": "D", "CYS": "C",
    "GLU": "E", "GLN": "Q", "GLY": "G", "HIS": "H", "ILE": "I",
    "LEU": "L", "LYS": "K", "MET": "M", "PHE": "F", "PRO": "P",
    "SER": "S", "THR": "T", "TRP": "W", "TYR": "Y", "VAL": "V",
}

def convert_to_one_letter(three_letter_code):
    three_letter_code = three_letter_code
    if three_letter_code in three_to_one:
        return three_to_one[three_letter_code]
    else:
        return None

# Iterate through residues
df = pd.DataFrame(columns=("Residue_ID", "Residue_Name"))
for model in structure:
    for chain in model:
        if chain.id == selected_chain:
            for i, residue in enumerate(chain):
                residue_id = residue.id[1]
                if residue_id in range((mutation_residue_number - 10) - 1, (mutation_residue_number + 10) + 2):
                    residue_name = residue.resname
                    df.loc[len(df)] = residue_id, residue_name
# print(df)
cmd.reinitialize()
cmd.load(input_pdb_file)

# Make Tuples of the Region to unfold on the chain
region_to_unfold_tuples = []
region_to_unfold_selection_tuples = []
for index, row in df.iterrows():
    #print(f"{row.iloc[1]}_{row.iloc[0]}", f"resn {row.iloc[1]} and resi {row.iloc[0]} and chain {selected_chain}")
    region_to_unfold_selection_tuples.append((f"{row.iloc[1]}_{row.iloc[0]}",f"resn {row.iloc[1]} and resi {row.iloc[0]} and chain {selected_chain}"))
    region_to_unfold_tuples.append((f"{row.iloc[1]}`{row.iloc[0]}",f"resn {row.iloc[1]} and resi {row.iloc[0]} and chain {selected_chain}"))
# to insure order cannot be changed, convert the collected lists into tuple of tuples
region_to_unfold_tuples = tuple(region_to_unfold_tuples)
region_to_unfold_selection_tuples = tuple(region_to_unfold_selection_tuples)


#object_list = cmd.get_object_list()[1:]
resNumLst = df.Residue_ID # No longer needed, I think

sys.stderr.write("Modifying dihedral angles...\\n")
previousResidue, currentResidue, nextResidue = (None for i in range(3))
for i in range(1, len(region_to_unfold_tuples) - 1):
    #Note the minus one for the upper range comes from that want to process through altering the angles for
    # the penultimate residue in region where collected information because the collected information includes 
    # information the one following the final residue to alter because using information about atoms of 
    # that following residue to specify the psi & omega angles, but don't want to alter the residue after the specified one.
    previousResidue = region_to_unfold_tuples[i-1][0]
    currentResidue = region_to_unfold_tuples[i][0]
    nextResidue = region_to_unfold_tuples[i+1][0]


    cmd.set_dihedral(f"/{input_pdb_file[:-4]}//{selected_chain}/{previousResidue}/C", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/N", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/CA", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/C", -120, quiet=0) # phi
    cmd.set_dihedral(f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/N", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/CA", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/C", f"/{input_pdb_file[:-4]}//{selected_chain}/{nextResidue}/N", 140, quiet=0)# psi
    cmd.set_dihedral(f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/CA", f"/{input_pdb_file[:-4]}//{selected_chain}/{currentResidue}/C", f"/{input_pdb_file[:-4]}//{selected_chain}/{nextResidue}/N", f"/{input_pdb_file[:-4]}//{selected_chain}/{nextResidue}/CA", 180, quiet=0) # omega

If you open 4fdi.pdb in PyMOL and run that (after installing Pandas via pip and restarting, if not already installed) & then restrict the view to just chain A, you get:

4fdi_unfolded_via_script

In the course of working it out, I generalized it as well. It is in form that is largely automated once you set the settings necessary, and you don't even need to install anything to use it. It works for entries in the Protein Data Bank if you have the PDB id code and the chain designation. Go here and press the 'launch binder' badge (or alternatively just click here to launch) and then when the Jupyter session comes up, choose from the available notebooks, 'Demo of Unfolding Region of Protein Chain via PyMOL - involving Pandas'. Then step through running the notebook. I suggest step through first running it with the settings there to see how it works and then change the settings to what you want for the next run. It should be adjustable with some hand editing to use custom, private PDB files uploaded to the session, too.

I also added a version with Pandas and Biopython left out because I don't see the need and less dependencies will make it easier to run for those only having PyMOL. See the link to that in the list of available notebooks or within this post above or



(Link to static version of the executed notebooks: static notebook for original poster that uses Pandas and BioPython & static notebook for streamlined approach. I don't suggest using these because the environment is quite complex and I supply a place to launch a working session without installing anything here; however, I am supplying them so those with advanced familiarity can assess the code and results.)

ADD COMMENT

Login before adding your answer.

Traffic: 3483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6