Similarity between DNA strings of two different classes

0

Entering edit mode

5.1 years ago

Gene_MMP8 ▴ 240

I have a two sets of DNA strings of length 10 belonging to two different classes (class 1 and 2 say). A few examples from the dataset is given below:

DNA_str_1    class
ATTGGCGGCA      1  
TAGGCGGGGC      2  
ATTGCGCTGT      1 
TAGGAGGAAG      2

Now, I want to construct a similarity matrix and check how different are the strings between the two classes. Basically, I want the similarity matrix will be such that after plotting the heatmap of the same, there will be two clear rectangular boxes signifying that the DNA strings belonging to the same class are very similar and those belonging to different classes are dissimilar. I though about using stringdist(method="hamming") in R to first construct the matrix and then plot the heatmap. Am I doing it correctly?

sequencing similarity • 845 views

ADD COMMENT • link 5.1 years ago by Gene_MMP8 ▴ 240

1

Entering edit mode

Yes, you are indeed looking for the Hamming distance, unless you wish to align the sequences using the Needleman Wunsch algorithm first and then use those scores for the heatmap.

ADD REPLY • link 5.1 years ago by Ram 45k

Login before adding your answer.