Entering edit mode
7.7 years ago
Expe
▴
10
Hi,
I started working with PDB files and Biopython. I can't figure out why there is a different sequence length between data in fasta files and in pdb files. An example is the protein 5dj7. In the fasta file, the length is 230, whereas in the pdb file, I get 593. To find the length using the pdb file I used the following code and I don't know if I am interpreting it right.
pdb_f ="5dj7.pdb"
structure = parser.get_structure('5dj7', pdb_f)
model=structure[0]
for model in structure:
for chain in model:
if chain == model['A']:
print(len(chain))
Thanks in advance!
The PDB likely has multiple chains or models depending on how the structure was resolved.
You may also not get exact multiples of an expected length because post translational processing may have removed residues etc. Open the PDB in a Viewer like PyMol or Chimera to examine it.
I think 593-230 = 363 water molecules are included in the chain, in that case.