Question

What kind of statistical test should I use? (Permutation test)

1

Entering edit mode

2.9 years ago

aa4120090 ▴ 10

I have the following sequence data

A_True: 45, 92, 134, 156, 199

A_Pred: 23, 44, 45, 46, 88, 156, 187, 188, 189, 210

These numbers represent the position in a sequence. The total length of sequence A is 230. A_True is actual positions (Ground truth). A_Pred is model predicted positions. So A_Pred is 10 model guess/predictions position out of 230 positions.

I would like to know if there is any statistical test that can evaluate whether set A_Pred hit/close to set A_True in terms of position (To check whether the position in set A_Pred is near to A_True or is it randomly picked)? And evaluate the following:

If the ten numbers on A_Pred overlap those in A_True, it means a perfect match, and a penalty of 4 extra numbers (Sequence 2 can predict/guess more numbers than sequence 1, but should have some penalty for mismatch).
92(in A_True) and 88(in A_Pred) has a distance difference of 4
199(in A_True) has a distance of 10 to 189(in A_Pred) and 11 to 188&210(in A_Pred)

I have more list pairs B_True,B_Pred,C_True,C_Pred... Any statistical tests can serve this purpose?

My other thought is to use combination 230C10 and find the statistically significant that it contains the number in set A_True. But combination cannot represent whether the prediction is "near" (88 in A_True is near to 92 in A_True) or not, as it treats every value as distinct numbers. My problem concerns the position.

sequence statistical test • 1.1k views

ADD COMMENT • link 2.9 years ago by aa4120090 ▴ 10

1

Entering edit mode

I don't know the answer, but just to sort of clarify the problem, if I were to guess 10 numbers between 1 and 230, and then calculate the distance between each guess, and the closest number in A_True (assuming no ordering of guesses), the sum of distances would be greater for my random guesses than your predictor (hopefully). And if I were to redo my guesses a million times I could build a curve reflecting the frequency distribution of distances generated by a random process, that could potentially be used to judge a predictor.

ADD REPLY • link 2.9 years ago by seidel 11k

0

Entering edit mode

Yes, may i ask what kind of statistic should I use to interpret the frequency distribution? Is there a technical name for this method?

ADD REPLY • link 2.9 years ago by aa4120090 ▴ 10

score 2 · Accepted Answer · 2022-01-21

You may want to try Kolmogorov-Smirnov test (KS) for two samples, which compares the distributions F(x) and G(x) of two independent one-dimensional samples. See here for python implementation of that function. The output is KS statistic and a p-value, and they are trying to test the null hypothesis that two samples came from different distributions. When KS statistic is 0 and p-value is 1, that means identical distributions. A large(er) KS statistic and a p-value < 0.05 confirms the null hypothesis, meaning that distributions are different.

A short python code with your data:

import numpy as np
from scipy.stats import ks_2samp

data1 = np.array([45, 92, 134, 156, 199])
data2 = np.array([23, 44, 45, 46, 88, 156, 187, 188, 189, 210])
ks_2samp(data1, data2)

The output:

Ks_2sampResult(statistic=0.3, pvalue=0.8618938799547571)

On the surface, this test rejects the null hypothesis of different distributions, which would indicate that predictions could be similar enough to true values. Yet part of the reason for not being able to reject the null hypothesis is that these are small samples. Here is what you get by testing two random distributions of 5 & 10 numbers in 0-200 range, which should be sufficiently similar to your true and predicted data.

data1a = np.random.randint(0, 200, 5)
data2a = np.random.randint(0, 200, 10)
ks_2samp(data1a, data2a)

Ks_2sampResult(statistic=0.4, pvalue=0.5402481714873653)
Ks_2sampResult(statistic=0.4, pvalue=0.5402481714873653)
Ks_2sampResult(statistic=0.8, pvalue=0.01159044044395834)
Ks_2sampResult(statistic=0.3, pvalue=0.8618938799547571)
Ks_2sampResult(statistic=0.6, pvalue=0.11032754154370181)

Only one of them comes close to confirming the null hypothesis, even though these are randomly drawn. This would be something to test comprehensively, but my guess is that in your case KS statistic <= 0.2 would mean good predictions, and larger values than that not good predictions.

Kullback–Leibler divergence may be worth checking out as well.