How to extract features from protein sequences, so that it can be converted into vector for training the data in machine learning. From some papers I found methods like using AAindex, PSSM for training data. But I was unable to find the detailed method behind it. Please, suggest some papers or links which can be helpful.
From the literature I found following article:
VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines
It uses Auto cross covariance (ACC). I have written the following python code to calculate it. Please suggest if its working fine.
http://biotoolsinsilico.blogspot.in/2014/07/auto-cross-covariance-python.html