I have a number of protein models of varying lengths in PDB format and I'm trying to do machine learning on them and predict their energy. I have the energy values of each of the protein models.
The problem is that machine learning algorithms obviously require a fixed length vector representation. The problem is that all my protein models have different lengths.
Does anyone know of a protein vector representation?
Hi Linus; thanks for the response. The sequences are actually not related to each other. Using properties of the proteins is tough because it would give bad predictions. I am interested in using the distances of the atoms in the protein model; Is there a standard way to represent a protein model as a feature vector considering the atomic distances?