model pdb to secondary structure?
4
0
Entering edit mode
2.9 years ago
Xylanaser ▴ 80

Hej i want to ask is there a way to:

from model stucure pdb (example rosetta) get secondary structure?

    ATOM      3  C   MET A   1       0.070   1.338   0.521  1.00 15.64           C  
    ATOM      4  O   MET A   1       1.281   1.526   0.669  1.00 15.64           O  
    ATOM      5  CB  MET A   1       0.462  -1.076   1.108  1.00 15.64           C  
    ATOM      6  CG  MET A   1      -0.099  -2.349   1.714  1.00 15.64           C  
    ATOM      7  SD  MET A   1      -0.858  -2.048   3.329  1.00 15.64           S  
    ATOM      8  CE  MET A   1       0.564  -1.647   4.333  1.00 15.64           C  
    ATOM      9 1H   MET A   1      -2.039  -1.225   0.216  1.00 15.64           H  
    ATOM     10 2H   MET A   1      -2.265   0.383  -0.127  1.00 15.64           H  
    ATOM     11 3H   MET A   1      -1.146  -0.499  -0.970  1.00 15.64           H  
    ATOM     12  HA  MET A   1      -1.082   0.182   1.864  1.00 15.64           H  
    ATOM     13 1HB  MET A   1       0.912  -1.328   0.149  1.00 15.64           H  
    ATOM     14 2HB  MET A   1       1.257  -0.714   1.762  1.00 15.64           H  
    ATOM     15 1HG  MET A   1      -0.851  -2.772   1.049  1.00 15.64           H  
    ATOM     16 2HG  MET A   1       0.699  -3.080   1.838  1.00 15.64           H  
    ATOM     17 1HE  MET A   1       0.241  -1.433   5.352  1.00 15.64           H 

    |
    v

>example
HHHHHHHHHHHCCCCCCCCCCEEEEEEEE
structure protein secondary • 2.9k views
ADD COMMENT
2
Entering edit mode
2.9 years ago
Jiyao Wang ▴ 380

You could use the DSSP program to calculate the secondary structures. Or you could use iCn3D (https://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html) to load your PDB with the menu “File > Open File > PDB File (appendable)”, then output the secondary structure using the menu “File > Save Files > Secondary Structure”. The whole process can be converted to a Node.js script to run in command line.

ADD COMMENT
2
Entering edit mode
2.9 years ago
Mensur Dlakic ★ 28k

It can be done using dssp, but its output format is not easy to parse. Assuming that BioPython is installed and dssp is somewhere in $PATH, this script will automate the process. It can be customized to adopt various conversions from the dssp format into 3-letter designation, which is explained in comments.

import os
import sys
import subprocess
import string
import argparse
from Bio.PDB import PDBParser
from Bio.PDB.DSSP import DSSP

parser = argparse.ArgumentParser(
                                 description='\n The program will extract DSSP secondary structure assignments.\n',
                                 epilog='\n \n',
                                 formatter_class=argparse.RawDescriptionHelpFormatter)
parser.add_argument(
                    'input_file', help='path to the input PDB file [required]')
args = parser.parse_args()

try:
    dssp_loc = subprocess.check_output("which dssp", shell=True)
#    sys.stdout.write('\n dssp found at %s\n' % dssp_loc)
except Exception:
    sys.stdout.write('\n dssp not found. It is available for download from https://github.com/cmbi/dssp \n\n')
    sys.exit(1)

if os.access(args.input_file, os.R_OK):
    p = PDBParser()
    pdb_file = args.input_file
    pdb_code = args.input_file.rsplit('.', 1)[0]
    structure = p.get_structure(pdb_code, pdb_file)
    model = structure[0]
    dssp = DSSP(model, pdb_file)

# Note that the recent DSSP executable from the DSSP-2 package was renamed from dssp to mkdssp. If using a recent DSSP release, you may need to provide the name of your DSSP executable:
#dssp = DSSP(model, pdb_file, dssp='mkdssp')
# DSSP data is accessed by a tuple - (chain id, residue id):
# The dssp data returned for a single residue is a tuple in the form:
#
# Tuple Index   Value
# 0     DSSP index
# 1     Amino acid
# 2     Secondary structure
# 3     Relative ASA
# 4     Phi
# 5     Psi
# 6     NH-->O_1_relidx
# 7     NH-->O_1_energy
# 8     O-->NH_1_relidx
# 9     O-->NH_1_energy
# 10    NH-->O_2_relidx
# 11    NH-->O_2_energy
# 12    O-->NH_2_relidx
# 13    O-->NH_2_energy

    aa_key = ''
    ss_key = ''

    for i in range(len(dssp)):
        aa_key = aa_key + str(dssp[list(dssp.keys())[i]][1])
        ss_key = ss_key + str(dssp[list(dssp.keys())[i]][2])

# Code                    Description
#   H                     Alpha Helix
#   B                     Beta Bridge
#   E                     Strand
#   G                     Helix-3
#   I                     Helix-5
#   T                     Turn
#   S                     Bend
#  ' '                    Coil (unstructured)
#SS-Scheme 1: H,G,I->H ; E,B->E ; T,S->C
#SS-Scheme 2: H,G->H ; E,B->E ; I,T,S->C THIS IS DEFAULT
#SS-Scheme 3: H,G->H ; E->E ; I,B,T,S->C
#SS-Scheme 4: H->H ; E,B->E ; G,I,T,S->C
#SS-Scheme 5: H->H ; E->E ; G,I,B,T,S->C

    ss_key = string.replace(ss_key, '-', 'C')
#Scheme 1
#    ss_key = string.replace(ss_key, 'G', 'H')
#    ss_key = string.replace(ss_key, 'I', 'H')
#    ss_key = string.replace(ss_key, 'B', 'E')
#    ss_key = string.replace(ss_key, 'T', 'C')
#    ss_key = string.replace(ss_key, 'S', 'C')
#Scheme 2
    ss_key = string.replace(ss_key, 'G', 'H')
    ss_key = string.replace(ss_key, 'B', 'E')
    ss_key = string.replace(ss_key, 'I', 'C')
    ss_key = string.replace(ss_key, 'T', 'C')
    ss_key = string.replace(ss_key, 'S', 'C')
#Scheme 3
#    ss_key = string.replace(ss_key, 'G', 'H')
#    ss_key = string.replace(ss_key, 'B', 'C')
#    ss_key = string.replace(ss_key, 'I', 'C')
#    ss_key = string.replace(ss_key, 'T', 'C')
#    ss_key = string.replace(ss_key, 'S', 'C')
#Scheme 4
#    ss_key = string.replace(ss_key, 'B', 'E')
#    ss_key = string.replace(ss_key, 'G', 'C')
#    ss_key = string.replace(ss_key, 'I', 'C')
#    ss_key = string.replace(ss_key, 'T', 'C')
#ss_key = string.replace(ss_key, 'S', 'C')
#Scheme 5
#    ss_key = string.replace(ss_key, 'G', 'C')
#    ss_key = string.replace(ss_key, 'B', 'C')
#    ss_key = string.replace(ss_key, 'I', 'C')
#    ss_key = string.replace(ss_key, 'T', 'C')
#    ss_key = string.replace(ss_key, 'S', 'C')

    print(aa_key)
    print(ss_key)

else:
    parser.error(
        '\n\n !!! Input file "%s" does not exist !!!\n' %
        args.input_file)
ADD COMMENT
1
Entering edit mode
2.9 years ago
Wayne ★ 2.1k

For anyone working with PyMOL already, you can access the assigned secondary structure as you iterate on the residues. This notebook offers a good demonstration of accessing those details via the PyMOL API. You can get an active form of that notebook set to run in Jupyter directly in your browser by going here, clcking launch binder and chossing 'Demo of Iterating over residue secondary structure' from the list of available notebooks after the session starts.

Noe the abbreviations differ, and so you'd need to convert, and so see Mensur Dlakic's answer showing a way to do that text replacement. (Commonly, you'll also see the .replace method used aside from using the string module directly, see here, which shows the difference.)

Related:

ADD COMMENT
1
Entering edit mode
2.9 years ago
Jiyao Wang ▴ 380

Now you can use a simple python script to retrieve the secondary structure information. The example script is at https://github.com/ncbi/icn3d/blob/master/icn3dpython/batch_export_ss.py. You can install selenium, chrome, chromedriver, and run the command "python3 batch_export_ss.py" as specified at https://github.com/ncbi/icn3d/blob/master/icn3dpython.

ADD COMMENT

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6