Entering edit mode
4.6 years ago
geethus2009
•
0
is there any python code to extract fasta file from PDB for a given protein_id(eg:- 1mkp)
is there any python code to extract fasta file from PDB for a given protein_id(eg:- 1mkp)
import sys
from Bio import SeqIO
PDBFile = sys.argv[1]
with open(PDBFile, 'r') as pdb_file:
for record in SeqIO.parse(pdb_file, 'pdb-atom'):
print('>' + record.id)
print(record.seq)
Save as pdb-seq.py
. Download PDB coordinates for 1mkp and type:
python pdb-seq.py 1mkp.pdb
>1MKP:A
ASFPVEILPFLYLGCAKDSTNLDVLEEFGIKYILNVTPNLPNLFENAGEFKYKQIPISDHWSQNLSQFFPEAISFIDEARGKN
CGVLVHSLAGISRSVTVTVAYLMQKLNLSMNDAYDIVKMKKSNISPNFNFMGQLLDFERTL
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
code should be for python 3
for record in SeqIO.parse(pdb_file, 'pdb-atom'): ^ SyntaxError: unexpected EOF while parsing Plase resolve is problem also for me
This code works fine with Python 3.6 on my computer. Also, I think you may be under a wrong impression that I should be troubleshooting this even after providing full code for you.
Good day!
Thanks for the script.
It works, but I have a question. How can I know the sequence FASTA of a specific selection of the PDB file? For example, if I have a chain with 100 residues, but I want to know only the first 10 residues FASTA sequence, how can I do that?
Thank you so much.
Regards, Brandon U.
You can modify Mensur Dlakic 's code as follows. This will get you the first 10 AA.
Check the
[:10]
addition that is making this possible. You can use an appropriate interval e.g.[4:24]
to get other sections of the sequence.