I recently read a paper on DNA residue binding (DRNApred), and I am interested to know how are the 20 observed frequencies for each amino acid are obtained via HHblits. From my understanding, you use HHblits to generate the MSA file from the input sequence against your database, then obtain the 20 observed frequencies from the output MSA file by doing some processing, using a command:
hhblits -i input_sequence.fasta -d /path/to/nr -o output.hhr -oa3m output.a3m
- Is this chain of thought correct?
- Are there any ready tools available that help you convert the MSA file to a observed frequency matrix?
- In this sense, is it similar to a PSSM matrix generated via PSI-BLAST?