I thought it would be very simple to write a code for this task given clear instructions in the BioPython file I referenced above, but it seems like that's not the case. Here is a working code that will download a structure and extract its DSSP designations. If you already have PDB files somewhere, simply skip the download and point to them in the parsing and DSSP functions.
from Bio.PDB import PDBParser
from Bio.PDB.DSSP import DSSP
from Bio.PDB import PDBList
pdb_dl = PDBList()
pdb_list = ['1ako']
for i in pdb_list:
pdb_dl.retrieve_pdb_file(i, pdir='./', file_format='pdb', overwrite=True)
p = PDBParser()
for i in pdb_list:
structure = p.get_structure(i, './pdb%s.ent' % i)
model = structure[0]
dssp = DSSP(model, './pdb%s.ent' % i, file_type='PDB')
sequence = ''
sec_structure = ''
for z in range(len(dssp)):
a_key = list(dssp.keys())[z]
sequence += dssp[a_key][1]
sec_structure += dssp[a_key][2]
print(i)
print(sequence)
print(sec_structure)
sec_structure = sec_structure.replace('-', 'C')
sec_structure = sec_structure.replace('I', 'C')
sec_structure = sec_structure.replace('T', 'C')
sec_structure = sec_structure.replace('S', 'C')
sec_structure = sec_structure.replace('G', 'H')
sec_structure = sec_structure.replace('B', 'E')
print(sec_structure)
The printout will contain a protein sequence, its original DSSP designation, and converted DSSP assignments after replacing 8-state with 3-state characters.
1ako
MKFVSFNINGLRARPHQLEAIVEKHQPDVIGLQETKVHDDMFPLEEVAKLGYNVFYHGQKGHYGVALLTKETPIAVRRGFPGDDEEAQRRIIMAEIPSLLGNVTVINGYFPQGESRDHPIKFPAKAQFYQNLQNYLETELKRDNPVLIMGDMNISPTDLDIGIGEENRKRWLRTGKCSFLPEEREWMDRLMSWGLVDTFRHANPQTADRFSWFDYRSKGFDDNRGLRIDLLLASQPLAECCVETGIDYEIRSMEKPSDHAPVWATFRR
-EEEEEE-S-GGG-HHHHHHHHHHH--SEEEEE-----GGG--HHHHHHTT-EEEEEEETTEEEEEEEESS--SEEEESSTT--HHHHTTEEEEEEEETTEEEEEEEEE-----BTT-TTHHHHHHHHHHHHHHHHHHH--TTS-EEEEEE-----SGGGB-S-HHHHHHHHHHTBTTS-HHHHHHHHHHHHTTEEEHHHHHSTT--S--SB--TTTTHHHHT--B--EEEEEEHHHHTTEEEEEE-HHHHTSSS--SB--EEEEE--
CEEEEEECCCHHHCHHHHHHHHHHHCCCEEEEECCCCCHHHCCHHHHHHCCCEEEEEEECCEEEEEEEECCCCCEEEECCCCCCHHHHCCEEEEEEEECCEEEEEEEEECCCCCECCCCCHHHHHHHHHHHHHHHHHHHCCCCCCEEEEEECCCCCCHHHECCCHHHHHHHHHHCECCCCHHHHHHHHHHHHCCEEEHHHHHCCCCCCCCCECCCCCCHHHHCCCECCEEEEEEHHHHCCEEEEEECHHHHCCCCCCCECCEEEEECC
Hello, did you ever solve this?