I have a two sets of DNA strings of length 10 belonging to two different classes (class 1 and 2 say). A few examples from the dataset is given below:
DNA_str_1 class
ATTGGCGGCA 1
TAGGCGGGGC 2
ATTGCGCTGT 1
TAGGAGGAAG 2
Now, I want to construct a similarity matrix and check how different are the strings between the two classes. Basically, I want the similarity matrix will be such that after plotting the heatmap of the same, there will be two clear rectangular boxes signifying that the DNA strings belonging to the same class are very similar and those belonging to different classes are dissimilar. I though about using stringdist(method="hamming") in R to first construct the matrix and then plot the heatmap. Am I doing it correctly?
Yes, you are indeed looking for the Hamming distance, unless you wish to align the sequences using the Needleman Wunsch algorithm first and then use those scores for the heatmap.