Hi All,
I am a beginner in Biopython. What I am trying to do is the following:
I have a sequence of amino acids (including gaps)and a corresponding PDB file. The numbering of amino acids in the PDB file does not match the numbering of the amino acids in the sequence list. I want to find the index of each amino acid entries in the PDB file and find the corresponding number in the sequence. For example, if the first entry in the PDB file is Alanine, I want to find the corresponding index of Alaline in the sequence list. Also, for gaps (-), I want to set the index as zero.
Here is the sequence list I have:
-LLPYFDF----DVPRNLTVTVGQT-GFLHCRVERLGDK-----DVSWIRKR----------DLHILTAGGTTYTSDQRFQVLRP---------------------------------------DGSANWTLQIKYPQPRDSGVYECQINTEP-KMSLSYTFNVVE-IVDPKFSSPIVNMTAPVGRDAFLTCVVQDLGPYKVAWLRVDTQTILTIQNHVITKNQRIGIANSEH---KTWTMRIKDIKESDKGWYMCQINTDPMKSQMGYLDVV----
Here is what I have tried so far:
import pylab as pyl
import numpy as np
import sys
import os
import re
import argparse
def parseArgs():
"""Parse command line arguments"""
try:
parser = argparse.ArgumentParser(
description = 'Read and extract items from input PDB file')
parser.add_argument('-i',
'--input',
action='store',
required=True,
help='input PDB file in standard format')
except:
print ("An exception occurred with argument parsing. Check your provided options.")
traceback.print_exc()
return parser.parse_args()
# Reads a PDB file and returns the residue name and coordinates for
# each C-alpha atom
# (the input argument for this routine is the pdb file name.)
def get_coordinates_PDB(File_In):
try:
fl = open(File_In,'r')
except:
print('Could not open input file {0}'.format(File_In))
sys.exit()
Res = []
Points = []
#Getting from a PDB file
for line in fl:
if not(line.startswith('ATOM')):
continue
elif (line[13:15] != 'CA'):
continue
resname = line[17:20]
xyz = re.findall('[-+]?\d+\.\d+', line)
tmp = np.zeros(3)
Res.append(resname)
tmp[0] = float(xyz[0])
tmp[1] = float(xyz[1])
tmp[2] = float(xyz[2])
Points.append(tmp)
fl.close()
return Points, Res
def main():
"""Read and parse a provided PDB file."""
#Parse arguments
args = parseArgs()
File_In = args.input
print(get_coordinates_PDB(File_In))
if __name__ == '__main__':
main()
This outputs the x,y,z coordinates and the amino acids in the PDB file. However, I am stalled at this point.
I would much appreciate if someone could help me with implementing the rest. Thank you in advance for your time and help!
There was a post several weeks ago. It may be useful to you.
Using STDIN with BioPython's PDB methods