Find mapping of indices of amino acid index in PDB files and sequence
0
0
Entering edit mode
5.9 years ago
JJP • 0

Hi All,

I am a beginner in Biopython. What I am trying to do is the following:

I have a sequence of amino acids (including gaps)and a corresponding PDB file. The numbering of amino acids in the PDB file does not match the numbering of the amino acids in the sequence list. I want to find the index of each amino acid entries in the PDB file and find the corresponding number in the sequence. For example, if the first entry in the PDB file is Alanine, I want to find the corresponding index of Alaline in the sequence list. Also, for gaps (-), I want to set the index as zero.

Here is the sequence list I have:

-LLPYFDF----DVPRNLTVTVGQT-GFLHCRVERLGDK-----DVSWIRKR----------DLHILTAGGTTYTSDQRFQVLRP---------------------------------------DGSANWTLQIKYPQPRDSGVYECQINTEP-KMSLSYTFNVVE-IVDPKFSSPIVNMTAPVGRDAFLTCVVQDLGPYKVAWLRVDTQTILTIQNHVITKNQRIGIANSEH---KTWTMRIKDIKESDKGWYMCQINTDPMKSQMGYLDVV----

Here is what I have tried so far:

import pylab as pyl
import numpy as np
import sys
import os
import re
import argparse

def parseArgs():
"""Parse command line arguments"""

try:
   parser = argparse.ArgumentParser(
   description = 'Read and extract items from input PDB file')

parser.add_argument('-i',
                    '--input',
                    action='store',
                    required=True,
                    help='input PDB file in standard format')

 except:
 print ("An exception occurred with argument parsing. Check your provided options.")
 traceback.print_exc()

 return parser.parse_args()

 # Reads a PDB file and returns the residue name and coordinates for 
 # each C-alpha atom
 # (the input argument for this routine is the pdb file name.)

def get_coordinates_PDB(File_In):
  try:
      fl = open(File_In,'r')
 except:
  print('Could not open input file {0}'.format(File_In))
  sys.exit()
  Res = []
  Points = []

 #Getting from a PDB file

for line in fl:
  if not(line.startswith('ATOM')):
    continue
elif (line[13:15] != 'CA'):
    continue
resname = line[17:20]
xyz = re.findall('[-+]?\d+\.\d+', line)
tmp = np.zeros(3)
Res.append(resname)
tmp[0] = float(xyz[0])
tmp[1] = float(xyz[1])
tmp[2] = float(xyz[2])
Points.append(tmp)
fl.close()
return Points, Res


def main():
 """Read and parse a provided PDB file."""


#Parse arguments
 args = parseArgs()

 File_In = args.input

print(get_coordinates_PDB(File_In))

if __name__ == '__main__':
    main()

This outputs the x,y,z coordinates and the amino acids in the PDB file. However, I am stalled at this point.

I would much appreciate if someone could help me with implementing the rest. Thank you in advance for your time and help!

sequence PDB python • 2.4k views
ADD COMMENT
0
Entering edit mode

There was a post several weeks ago. It may be useful to you.

Using STDIN with BioPython's PDB methods

ADD REPLY

Login before adding your answer.

Traffic: 2452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6