Parsing Result File
2
0
Entering edit mode
13.9 years ago
Samimirza ▴ 60

Hi,

I am trying to match PDB records with results from dimplot.

I have managed to create a dictionary : pdb[chain][residue][residue_no]

Now I have some data in folders for each PDB, I am interested in parsing a particular file whose contents are shown below: for eg

pdb1aut: chains L and C


                                             Hydrogen bonds  Non-bonded
               Interacting residues          M-M   S-S   M-S  contacts
          ------------------------------     ---   ---   ---    -----  
Residues: [HIS  107  L] -> [PHE  208  C]      0     0     0        2

Residues: [PRO  144  L] -> [TRP  207  C]      0     0     0        1

Residues: [GLY  142  L] -> [TRP  207  C]      1     0     0        3

Now I want to match the residue details from dic created above and make: pdb[chain][residue][residue_no][M-M][S-S][M-S][non-bonded contacts]

i also need to count the number of such interactions for eg: TRP 207 C comes in line 2 and 3 both, so I need to add number of M-M interactions etc.

I still havenot written a code and I cant do anything complicated since I am a newbie in python.

python parsing pdb • 3.0k views
ADD COMMENT
2
Entering edit mode

One thing that would be nice is to show us your code so far. Making a dictionary that is nested that far is also a bad idea. It may be a better idea to build a class to hold the information for each residue.

ADD REPLY
3
Entering edit mode
13.9 years ago

GWW makes an excellent point about your code design. For learning purposes, it's best to post what code you have written so answers can build off of that. To get started, this implementation parses the file and organizes by top level chain and residues. The counts are collapsed with addition for identical residues, so the output for your example looks like:

% python2.6 parse_dimplot.py dimplot-out.txt
{('C', '207', 'TRP'): {'M-M': 1, 'M-S': 0, 'Non-bonded contacts': 4, 'S-S': 0},
 ('C', '208', 'PHE'): {'M-M': 0, 'M-S': 0, 'Non-bonded contacts': 2, 'S-S': 0},
 ('L', '107', 'HIS'): {'M-M': 0, 'M-S': 0, 'Non-bonded contacts': 2, 'S-S': 0},
 ('L', '142', 'GLY'): {'M-M': 1, 'M-S': 0, 'Non-bonded contacts': 3, 'S-S': 0},
 ('L', '144', 'PRO'): {'M-M': 0, 'M-S': 0, 'Non-bonded contacts': 1, 'S-S': 0}}

And the code:

import sys
import pprint

def _parse_chain_parts(c):
    return c.replace("[", "").replace(" -> ", "").split()

def _parse_stats(s):
    m_m, s_s, m_s, non_bond = s.rstrip().split()
    return {"M-M": int(m_m), "S-S": int(s_s), "M-S": int(m_s),
            "Non-bonded contacts": int(non_bond)}

def parse_dimplot_line(line):
    line = line.replace("Residues: ", "")
    chain1, chain2, stats_str = line.split("]")
    stats = _parse_stats(stats_str)
    for chain_str in [chain1, chain2]:
        residue, r_num, chain = _parse_chain_parts(chain_str)
        yield (chain, r_num, residue, stats)

def parse_dimplot(in_file):
    with open(in_file) as in_handle:
        for line in in_handle:
            if line.startswith("Residues:"):
                for chain in parse_dimplot_line(line):
                    yield chain

def combine_stats(one, two):
    combined = dict()
    for k, val in one.items():
        combined[k] = val + two[k]
    return combined

in_file = sys.argv[1]
organized = dict()
for chain, r_num, residue, stats in parse_dimplot(in_file):
    r_key = (chain, r_num, residue)
    if organized.has_key(r_key):
        organized[r_key] = combine_stats(stats, organized[r_key])
    else:
        organized[r_key] = stats
pprint.pprint(organized)
ADD COMMENT
0
Entering edit mode

Thank you for the code, I did a simpler version of the code and created an intermediate file, which I am trying to annotate.

ADD REPLY
1
Entering edit mode
13.9 years ago
Lee Katz ★ 3.2k

You might decide to just use a standard library like BioPerl. If using Perl, you can create a mapping hash however you want and then proceed with your analysis using the code samples below.

my %map=(atom1=>[0,0,0,2], ...);

http://www.bioperl.org/Core/Latest/bptutorial.html#iii_9_1_using_3d_structure_objects_and_reading_pdb_files__structurei__structure__io_

  use Bio::Structure::IO;
  use strict;
  my $structio = Bio::Structure::IO->new(-file => $file);
  my $struc = $structio->next_structure;
  for my $chain ($struc->get_chains) {
     my $chainid = $chain->id;
     # one-letter chaincode if present, 'default' otherwise
     for my $res ($struc->get_residues($chain)) {
        my $resid = $res->id;
        # format is 3-lettercode - dash - residue number, e.g. PHE-20
        my $atoms = $struc->get_atoms($res);
        # actually a list of atom objects, used here to get a count
        print join "\t", $chainid,$resid,$atoms,"\n";
     }
  }

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Structure/IO/pdb.html

$stream = Bio::Structure::IO->new(-file => $filename,
                                  -format => 'PDB');

while (my $structure = $stream->next_structure) {
        # do something with $structure
}
ADD COMMENT
0
Entering edit mode

I am a newbie, so currently not well versed with complicated programming and biopython.

ADD REPLY

Login before adding your answer.

Traffic: 2626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6