I am working on the benchmarking of metagenome classifier software and I noticed the following interesting thing. I am using in-silico generated datasets (so I exactly know the microbial composition of my samples) to test the classifiers and would like to measure the dissimilarity between the original "gold standard" and the result of the classifiers, using Bray-Curtis dissimilarity. If I compare the results from two classifiers, both of which have correctly identified all the species in the sample, the Bray-Curtis dissimilarity seems lower for the classifier, which produced a lot of false positives. Is this an actual bias of the statistical method? What kind of alternative beta-diversity metric can I use that is more strict with false positives?