I would do a linear model modelling on a poisson distribution.
First, merge your datasets into a single dataframe with two columns, like the following:
> d1 = data.frame(x=sample(seq(0,1,0.01),100,replace=T), dataset='d1')
> d2 = data.frame(x=sample(seq(0,1,0.01),100,replace=T), dataset='d2')
> mydf = rbind(d1, d2)
Then you can use a glm model as the following. The poisson distribution is a good choice in this case because you have discrete value with non-negative numbers.
> summary(glm(x~dataset, data=mydf, family='poisson'))
Call:
glm(formula = x ~ dataset, family = "poisson", data = mydf)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.01617 -0.42070 0.01332 0.33097 0.63025
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.70401 0.14219 -4.951 7.37e-07 ***
datasetd2 0.04294 0.19896 0.216 0.829
---
....
This summary is telling that an element from dataset d1 has a negative odds ratio (-0.7) of being equal to 1, which is quite in line with the data given that it ranges from 0 to 1.
If the two datasets are different, then you would observe a significant p-value at dataset d2. An element from d2 is slightly more likely (odds ratio of +0.04294 higher than d1) to be equal to 1, but this is not significant (p-value 0.829).
It may be interesting to multiply your scores by 100. In this case the regression will give you the odds of having a difference of 0.01 in the original data:
> glm(I(x*100)~dataset, data=mydf, family='poisson') %>% summary
Call:
glm(formula = I(x * 100) ~ dataset, family = "poisson", data = mydf)
Deviance Residuals:
Min 1Q Median 3Q Max
-10.1617 -4.2070 0.1332 3.3097 6.3025
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.90116 0.01422 274.360 <2e-16 ***
datasetd2 0.04294 0.01990 2.158 0.0309 *
....
This is telling you that an element in d1 has an high odds ratio of being higher than 0.01. An element in d2 is also slightly more likely to be 0.01 higher than an element in d1 (p-value 0.0309)
------EDIT -------
Sorry for the many edits. Maybe it is easier to interpret the regression using a identity rather than a log link.
This time we can do it with a dataset derived from randu, which should be reproducible:
> d1 = data.frame(x=round(randu$x[0:200],2), dataset='d1')
> d2 = data.frame(x=round(randu$x[201:400],2), dataset='d2')
> mydf = rbind(d1, d2)
> summary(glm(x~dataset, data=mydf, family=poisson(link='identity')) )
Call:
glm(formula = x ~ dataset, family = poisson(link = "identity"),
data = mydf)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.02699 -0.34096 0.02005 0.32455 0.58137
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.52735 0.05135 10.270 <2e-16 ***
datasetd2 -0.00195 0.07255 -0.027 0.979
---
....
The regression now tells you that dataset d1 has a mean of 0.52745 and that this is significantly different from 0. Dataset d2 has a mean 0.00195 lower than d1 (it should be 0.52540), and there is no significant difference.
In alternative, since your values range from 0 to 1, you could also model on a binomial distribution:
> summary(glm(x~dataset, data=mydf, family='binomial'))
Call:
glm(formula = x ~ dataset, family = "binomial", data = mydf)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.20523 -0.54424 0.01911 0.51139 1.18660
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.02160 0.20001 -0.108 0.914
datasetd2 0.08682 0.28293 0.307 0.759
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 83.948 on 199 degrees of freedom
Residual deviance: 83.854 on 198 degrees of freedom
AIC: 280.86
maybe this helps:
http://stats.stackexchange.com/questions/88764/test-for-difference-between-2-empirical-discrete-distributions
http://www.mathworks.com/matlabcentral/answers/63476-quantify-difference-between-discrete-distributions