Question

How to calculate p-values from PLINK output by hand

0

Entering edit mode

3.8 years ago

L_to_the_m ▴ 10

Hi,

I have been doing a logistic regression with cases and controls in PLINK2. I have an output from PLINK, looking like this:

  CHROM POS  ID  REF ALT A1 A1_CASE_CT A1_CTRL_CT CASE_ALLELE_CT CTRL_ALLELE_CT A1_FREQ A1_CASE_FREQ A1_CTRL_FREQ FIRTH? TEST OBS_CT OR  LOG(OR)_SE Z_STAT  P
  19 39738787 rs12979860 C T T 7          400        10             1072          0.376155    0.7       0.373134      Y     ADD  541 3.38932   0.512208 2.38308  0.0171687

How does PLINK2 gets the p-value? I am a little bit confused about the calculation of the logistic regression? What makes a association being significant?

Best,

Lukas

plink p-values • 1.5k views

ADD COMMENT • link 3.8 years ago by L_to_the_m ▴ 10

0

Entering edit mode

I think the null hypothesis here is "allele frequency in controls is not different from allele frequency in cases". Logistic regression challenges this by testing if case/control status is significantly associated with allele frequency. Is it what you are asking?

ADD REPLY • link 3.8 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

kind of, but thanks. I want to know, which numbers PLINK is going to compare in the analysis to get the p-values.

ADD REPLY • link 3.8 years ago by L_to_the_m ▴ 10

0

Entering edit mode

I would say 7/10 and 400/1072 (which gives 0.7 and 0.37313). (7 + 400) / (10 + 1072) gives 0.376. However how 3.389 is an odds ratio...I'd expect it to be 1.876. Then you have a variance and z-score (OR / se_of_OR) of 2.38308, which in turn gives a p-value of 0.0171687. However I don't get how they got an OR of 3.38932...

ADD REPLY • link 3.8 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Ah well it can be that it is coded not as 0/1 but as 0/1/2 - and then odds ratio depends on the number of homozygous variants too. I believe the effects of mutations is linear in log-space and thus can't be simply decomposed into a contingency table.

ADD REPLY • link 3.8 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

At the most basic level, it's just a p-value from a 2x2 contingency table of minor allele counts - A: SNP dataset and Z Score

ADD REPLY • link 3.8 years ago by Kevin Blighe 88k