Different length but same sequence (PDB)

0

Entering edit mode

7.7 years ago

Expe ▴ 10

Hi,

I started working with PDB files and Biopython. I can't figure out why there is a different sequence length between data in fasta files and in pdb files. An example is the protein 5dj7. In the fasta file, the length is 230, whereas in the pdb file, I get 593. To find the length using the pdb file I used the following code and I don't know if I am interpreting it right.

pdb_f ="5dj7.pdb" 
structure = parser.get_structure('5dj7', pdb_f)
model=structure[0]

for model in structure:
    for chain in model:
        if chain == model['A']:
            print(len(chain))

Thanks in advance!

pdb fasta sequence length biopython • 1.9k views

ADD COMMENT • link 7.7 years ago by Expe ▴ 10

0

Entering edit mode

The PDB likely has multiple chains or models depending on how the structure was resolved.

You may also not get exact multiples of an expected length because post translational processing may have removed residues etc. Open the PDB in a Viewer like PyMol or Chimera to examine it.

ADD REPLY • link 7.7 years ago by Joe 21k

0

Entering edit mode

I think 593-230 = 363 water molecules are included in the chain, in that case.

ADD REPLY • link 7.7 years ago by fishgolden ▴ 520

Login before adding your answer.