I have about 400 bacteriophage genomes. For each bacteriophage, I have computed the secondary structures for each gene using RaptorX.
RaptorX uses a probabilistic method to calculate secondary structure. So for each amino acid, a probability is given for 8 structures.
For instance, the 16th amino acid of gene 3 of Tweety was calculated to be:
16 R H 0.893 0.010 0.000 0.000 0.000 0.087 0.004 0.005
Each of the 8 columns represents a probability. In this case, the program predicted the structure to be a helix. To study the synteny of these phage, I want to compare the predictions of each gene against all the other genes. While this sounds very, very computationally intensive, I want to be able to add the comparisons into a database so that this process is sped up. However, if you have a better suggestion, I'd greatly appreciate your input.
My primary question is to how I should compare each gene against another thing. It seems as though the probability values could provide a more powerful comparison. As such, I'd like to utilize all 8 values in my comparison. What would be a powerful way to take all values into account and compare one gene's predications against another phage's gene?