Similarity between DNA strings of two different classes
0
0
Entering edit mode
4.6 years ago
Gene_MMP8 ▴ 240

I have a two sets of DNA strings of length 10 belonging to two different classes (class 1 and 2 say). A few examples from the dataset is given below:

DNA_str_1    class
ATTGGCGGCA      1  
TAGGCGGGGC      2  
ATTGCGCTGT      1 
TAGGAGGAAG      2

Now, I want to construct a similarity matrix and check how different are the strings between the two classes. Basically, I want the similarity matrix will be such that after plotting the heatmap of the same, there will be two clear rectangular boxes signifying that the DNA strings belonging to the same class are very similar and those belonging to different classes are dissimilar. I though about using stringdist(method="hamming") in R to first construct the matrix and then plot the heatmap. Am I doing it correctly?

sequencing similarity • 754 views
ADD COMMENT
1
Entering edit mode

Yes, you are indeed looking for the Hamming distance, unless you wish to align the sequences using the Needleman Wunsch algorithm first and then use those scores for the heatmap.

ADD REPLY

Login before adding your answer.

Traffic: 1723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6