Question

to extract fasta file from PDB and obtain the content of file only as protein sequence

0

Entering edit mode

4.6 years ago

geethus2009 • 0

is there any python code to extract fasta file from PDB for a given protein_id(eg:- 1mkp)

alignment sequence assembly • 11k views

ADD COMMENT • link updated 3.4 years ago by GenoMax 147k • written 4.6 years ago by geethus2009 • 0

score 2 · Answer 1 · 2020-04-30

2

Entering edit mode

4.6 years ago

Mensur Dlakic ★ 28k

import sys
from Bio import SeqIO

PDBFile = sys.argv[1]
with open(PDBFile, 'r') as pdb_file:
    for record in SeqIO.parse(pdb_file, 'pdb-atom'):
        print('>' + record.id)
        print(record.seq)

Save as pdb-seq.py. Download PDB coordinates for 1mkp and type:

python pdb-seq.py 1mkp.pdb

>1MKP:A
ASFPVEILPFLYLGCAKDSTNLDVLEEFGIKYILNVTPNLPNLFENAGEFKYKQIPISDHWSQNLSQFFPEAISFIDEARGKN
CGVLVHSLAGISRSVTVTVAYLMQKLNLSMNDAYDIVKMKKSNISPNFNFMGQLLDFERTL

ADD COMMENT • link 4.6 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

code should be for python 3

for record in SeqIO.parse(pdb_file, 'pdb-atom'): ^ SyntaxError: unexpected EOF while parsing Plase resolve is problem also for me

ADD REPLY • link 4.6 years ago by geethus2009 • 0

1

Entering edit mode

code should be for python 3

This code works fine with Python 3.6 on my computer. Also, I think you may be under a wrong impression that I should be troubleshooting this even after providing full code for you.

ADD REPLY • link 4.6 years ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Good day!

Thanks for the script.

It works, but I have a question. How can I know the sequence FASTA of a specific selection of the PDB file? For example, if I have a chain with 100 residues, but I want to know only the first 10 residues FASTA sequence, how can I do that?

Thank you so much.

Regards, Brandon U.

ADD REPLY • link 3.4 years ago by Brandon Usuga • 0

0

Entering edit mode

You can modify Mensur Dlakic 's code as follows. This will get you the first 10 AA.

import sys
from Bio import SeqIO

PDBFile = sys.argv[1]
with open(PDBFile, 'r') as pdb_file:
    for record in SeqIO.parse(pdb_file, 'pdb-atom'):
        print('>' + record.id)
        print(record.seq[:10])

Check the [:10] addition that is making this possible. You can use an appropriate interval e.g. [4:24] to get other sections of the sequence.

ADD REPLY • link 3.4 years ago by GenoMax 147k