Question

chisq.test usage and output interpretation

0

Entering edit mode

3.1 years ago

nanodano ▴ 30

Hi there,

I would like to figure out if my categorical assignment for eye color is significantly different than my colleagues' assignment across the samples. The data looks something like this:

head(cats)
 sampleID  SR_lables MY_lables
1   SA4001 dark brown dark brown
2   SA4002 dark brown dark brown
3   SA4003 dark brown dark brown
4   SA4004 dark brown dark brown
5   SA4005 dark brown dark brown
6   SA4006 dark brown dark brown

I think a chisq.test in R would be the best route to assess this (if you have other recommendations please let me know). But I'm not quite understanding how to go about it. I've got this far:

chisq.test(cats$SR_lables, cats$MY_lables)
Pearson's Chi-squared test

data:  cats$SR_lables and cats$MY_lables
X-squared = 1591.4, df = 42, p-value < 2.2e-16

But I'm not sure if I went about it correctly or what the output is telling me. Is it telling me that our assignments are significantly different from one another? Any help is appreciated! Thanks!

categorical chisq.test variables chi R • 1.1k views

ADD COMMENT • link updated 3.1 years ago by Matthias Zepper 5.1k • written 3.1 years ago by nanodano ▴ 30

0

Entering edit mode

No. The null hypothesis is, that the factor levels of factor 'x' and factor 'y' are statistically independent, which is rejected here. But if your colleague labelled green eyes constantly as brown, that would still count as association, even though you have a mismatch.

One could more explicitly model this experiment as a Bernoulli_process with 0 for each mismatch between the two of you and 1 for each match and perform a bionomial test (which is indeed a chi.sqrt test): binom.test(Number of mismatches, Number of matches, Expected frequency, alternative = "two.sided"). In reality, it is probably not a perfect Bernoulli_process, because the component Bernoulli variables are not independent (prior assessments likely influence your current assessments = training effect) and the likely explanatory variable eye colour is not considered (you might be more in alignment for say brown eyes than discriminating blue eyes).

ADD REPLY • link 3.1 years ago by Matthias Zepper 5.1k

score 0 · Answer 1 · 2022-10-25

If you're interested in measuring agreement, you're using the wrong test. A chi-squared test measures association/independence and therefore will give the same p-value whether the two annotators agree all the time or disagree all the time. In your situation, you could simply give the fraction/percentage of cases where there's agreement. It is also very common to use Cohen's kappa index although its use has been criticized. There are also alternatives, search for measures of inter-rater agreement/reliability.