Entering edit mode
6.3 years ago
Biologist
▴
290
From this Research paper Table1 Association of RAD51-AS1 expression with clinicopathological features of EOC patients I see that p-value
is calculated based on Chi-square test.
Age Low-RAD51-AS1 High-RAD51-AS1 P-value
<50 25 (38.5) 17 (26.6) 0.149
≥50 40 (61.5) 47 (73.4)
For the Variable Age the p-value
is 0.149
But when I calculated it gave a different value.
data <- data.frame(x= c(25, 40), y=c(17, 47))
chisq.test(data, correct = T)
Pearson's Chi-squared test with Yates' continuity
correction
data: data
X-squared = 1.5728, df = 1, p-value = 0.2098
It is not only with Age even the rest all variable data also gives different p-values compared with the p-values in the Research paper.
What could be the reason for this different p-values? Did I do anything wrong?
Thank you Kevin. I would also like to know Is it wrong calculation if
correct=TRUE
. At what times it should be TRUE?My background is not pure statistics - it's biology and computer science. That said, bioinformatics overlaps into statistics and many bioinformaticians understand much statistical methodologies, myself included.
Whilst I cannot give a complete definition of continuity correction, I am aware that it is used for slightly similar reasons as performing P value adjustment in expression studies, that is, to prevent overestimation of the statistical significance. When we conduct Pearson Chi-squared test, the assumption is that the frequencies in our contingency table follow a binomial distribution, which is not often true. The continuity correction attempts to 'adjust' for this situation.
If you want to delve further into it, I suggest posting on StackExchange.