Entering edit mode
6 months ago
Nafi
•
0
I am trying to make a dataset from rcsb for my research paper.
I have seen many pdbs have less residues in a chain of a protein than the full FASTA sequence. Most likely, the cause is that they were unmodeled due to its going missing during the crystallization phase. For example, 1ZM1. Here in chain B, the last couple of residues were unmodeled.
Should I use the shortened FASTA from pdb or should I use the full FASTA for my dataset?
What analysis are you trying to use the data for?
Protein attribute prediction