Question

Calculate Extent Of Sequence Similarity.

1

Entering edit mode

11.7 years ago

matray312 ▴ 10

I had a problem that I was wondering if it could be solved by one of the techniques/algorithms used in bioinformatics to give the extent of similarity. I am a

Problem Statement: We have a sensor (it's like a magnetic compass and has a dial with twelve equal zones - 30 degrees each) that every second outputs where its pointing.

The typical random output of the sensor may look like (for example)

30 30 30 30 30 30 120 120 120 120 120 120 60 60 60 60 60 60 330 330 330 330  30 30 30 30 210 210 210 210 210 60 60 60 60 60 60 60 60 60 60 60 60 60   ……….etc.

We wanted to see if we can calculate a measure of similarity of two 4-minute sequence samples taken at different times during the day . (It would be great if we could state something like - the sequences are similar and there is a 1 in million(say)chance that we may be wrong.)

dna algorithm sequence • 2.9k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 11.7 years ago by matray312 ▴ 10

Ram · Answer 1 · 2013-09-03

Perhaps you could do a correlation analysis which would be easiest in R using the cor.test function. This will give you a p-value that tells you how will the two outputs are correlated

Example code shown below:

sensor_output_1 = c(30,30,30,30,30,30,120,120,120,120)
sensor_output_2 = c(30,45,30,30,120,120,100,100,120,120)
cor.test(sensor_output_1, sensor_output_2, alternative='greater')

    Pearson's product-moment correlation

data:  sensor_output_1 and sensor_output_2 
t = 2.0324, df = 8, p-value = 0.03829
alternative hypothesis: true correlation is greater than 0 
95 percent confidence interval:
 0.04607661 1.00000000 
sample estimates:
      cor 
0.5835345

Ram · Answer 2 · 2013-09-03

0

Entering edit mode

11.7 years ago

Sudeep ★ 1.7k

A naive approach would be to treat your 4 minute sensor read out as a long piece of string (text) and calculate similarity between these strings using some string similarity measures, but this is not (entirely) sequence similarity as you want.

ADD COMMENT • link updated 6.3 years ago by Ram 45k • written 11.7 years ago by Sudeep ★ 1.7k