Question

p-value with chi-square test

0

Entering edit mode

6.9 years ago

Biologist ▴ 290

From this Research paper Table1 Association of RAD51-AS1 expression with clinicopathological features of EOC patients I see that p-value is calculated based on Chi-square test.

 Age   Low-RAD51-AS1  High-RAD51-AS1 P-value
 <50    25 (38.5)      17 (26.6)       0.149
 ≥50    40 (61.5)      47 (73.4)

For the Variable Age the p-value is 0.149

But when I calculated it gave a different value.

data <- data.frame(x= c(25, 40), y=c(17, 47))
chisq.test(data, correct = T)

    Pearson's Chi-squared test with Yates' continuity
    correction

data:  data
X-squared = 1.5728, df = 1, p-value = 0.2098

It is not only with Age even the rest all variable data also gives different p-values compared with the p-values in the Research paper.

What could be the reason for this different p-values? Did I do anything wrong?

RNA-Seq statistics p-value significant test r • 2.3k views

ADD COMMENT • link updated 6.9 years ago by Kevin Blighe 89k • written 6.9 years ago by Biologist ▴ 290

score 1 · Answer 1 · 2018-08-16

1

Entering edit mode

6.9 years ago

Kevin Blighe 89k

Just switch off the continuity correction.

chisq.test(df[,c("High", "Low")], correct=FALSE)

    Pearson's Chi-squared test

data:  df[,c("High", "Low")]
X-squared = 2.0794, df = 1, p-value = 0.1493

Kevin

ADD COMMENT • link 6.9 years ago by Kevin Blighe 89k

0

Entering edit mode

Thank you Kevin. I would also like to know Is it wrong calculation if correct=TRUE. At what times it should be TRUE?

ADD REPLY • link 6.9 years ago by Biologist ▴ 290

1

Entering edit mode

My background is not pure statistics - it's biology and computer science. That said, bioinformatics overlaps into statistics and many bioinformaticians understand much statistical methodologies, myself included.

Whilst I cannot give a complete definition of continuity correction, I am aware that it is used for slightly similar reasons as performing P value adjustment in expression studies, that is, to prevent overestimation of the statistical significance. When we conduct Pearson Chi-squared test, the assumption is that the frequencies in our contingency table follow a binomial distribution, which is not often true. The continuity correction attempts to 'adjust' for this situation.

If you want to delve further into it, I suggest posting on StackExchange.

ADD REPLY • link 6.9 years ago by Kevin Blighe 89k