Why the p-value is 0, or how to keep the precision for small values?
2
2
Entering edit mode
4.0 years ago
Star ▴ 60

Hi all,

I am trying to calculate p-value from z-score, but p-value is coming as 0.

2*pnorm(-abs(37.9353259515))
[1] 0

However, if I get a smaller z-score, I get the accurate p-value.

2*pnorm(-abs(31.4730702149))
[1] 2.029847e-217

Can anyone please suggest a way around to get accurate p-value in 1st case instead of 0?

Cheers!

R • 2.4k views
ADD COMMENT
6
Entering edit mode
4.0 years ago

Back in the days when computers didn't have the oomph they have today, you'd see a lot of -log10(value) calculations used to report very small numbers. Using calculations on or presenting log-transformed values may be an alternative, until you get to a point where you need to report high-precision results with Rmfpr or similar, depending on your programming environment.

library(Rmpfr)
.N <- function(.) mpfr(., precBits = 200)

2 * pnorm(-abs(.N(c(31.4730702149, 37.9353259515))))
# 2 'mpfr' numbers of precision  200   bits 
# [1] 2.0298465195466694380461698263104467695985404891415345411446557e-217
# [2] 6.7359508146380783828098958784721518979443306021620032358869696e-315

As to the "why" that explains R's native behavior, I'd suspect that it uses double-precision floating point values internally. This gives you a working limit on measurements with values ranging from approximately ±5e−324 to ±5e324 (subnormal) or ±2.23e-308 to ±2.23e308 (normal).

Given your example, I'd say R is probably using the "normal" range, which accounts for a binary error factor at the edges ("epsilon") that lets you do arithmetic more easily and accurately.

Long answer, short: If you need to work with numerical values outside this range — and there can be cases when data need to be ranked with scores of this type — either log-transform your numbers or use a third-party library to get 128-bit or greater precision. 128-bit precision will support values ranging from approximately ±6.48e−4966 to ±6.48e4966 (subnormal), or ±1.19e-4932 to ±1.19e4932 (normal).

I'm certain that the more precision you require from third-party libraries, the longer it will take to do calculations. Performance is already an issue in R, so that is something to keep in mind. A log-transform is very probably cheap and easy, comparatively.

ADD COMMENT
3
Entering edit mode
4.0 years ago

You've reached the limits of machine precision. 0 and 1e-217 (1 preceded by 216 zeros) are not meaningfully different. I don't see how adding more zeros will help you. This looks to me like a case of the XY problem.

ADD COMMENT
0
Entering edit mode

(This is probably good enough to be an answer rather than a comment.)

ADD REPLY
0
Entering edit mode

Is it architecture-dependent? I get the following on:

  • Scientific Linux release 7.4 (Nitrogen)
  • Kernel name is : Linux 3.10.0-514.6.1.el7.x86_64 x86_64
  • CPU Type : Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz

.

.Machine$double.xmin
[1] 2.225074e-308


options(scipen=999)
.Machine$double.xmin
[1] 0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002225074
ADD REPLY
0
Entering edit mode

This is a complicated thing and I am not sure I understand all of it. This has to do with floating point arithmetic. As far as I know, machine precision depends on the numeric type (i.e. whether using float or double or long), on the programming language (or the compiler) and also the OS. Then I think most values in use are actually defined by some IEEE standard but not every application has to use them. To complicate matters, there are different "types" of machine precision. Strictly speaking machine precision is the smallest number eps such that 1+eps != 1 (i.e. .Machine$double.eps). Then there's the smallest non-zero number (.Machine$double.xmin). In R, .Machine gives you what the machine is capable of but that doesn't mean this is what's used by R or even some packages.

ADD REPLY

Login before adding your answer.

Traffic: 2658 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6