Entering edit mode
3.7 years ago
L_to_the_m
▴
10
Hi,
I have been doing a logistic regression with cases and controls in PLINK2. I have an output from PLINK, looking like this:
CHROM POS ID REF ALT A1 A1_CASE_CT A1_CTRL_CT CASE_ALLELE_CT CTRL_ALLELE_CT A1_FREQ A1_CASE_FREQ A1_CTRL_FREQ FIRTH? TEST OBS_CT OR LOG(OR)_SE Z_STAT P
19 39738787 rs12979860 C T T 7 400 10 1072 0.376155 0.7 0.373134 Y ADD 541 3.38932 0.512208 2.38308 0.0171687
How does PLINK2 gets the p-value? I am a little bit confused about the calculation of the logistic regression? What makes a association being significant?
Best,
Lukas
I think the null hypothesis here is "allele frequency in controls is not different from allele frequency in cases". Logistic regression challenges this by testing if case/control status is significantly associated with allele frequency. Is it what you are asking?
kind of, but thanks. I want to know, which numbers PLINK is going to compare in the analysis to get the p-values.
I would say 7/10 and 400/1072 (which gives 0.7 and 0.37313). (7 + 400) / (10 + 1072) gives 0.376. However how 3.389 is an odds ratio...I'd expect it to be 1.876. Then you have a variance and z-score (OR / se_of_OR) of 2.38308, which in turn gives a p-value of 0.0171687. However I don't get how they got an OR of 3.38932...
Ah well it can be that it is coded not as 0/1 but as 0/1/2 - and then odds ratio depends on the number of homozygous variants too. I believe the effects of mutations is linear in log-space and thus can't be simply decomposed into a contingency table.
At the most basic level, it's just a p-value from a 2x2 contingency table of minor allele counts - A: SNP dataset and Z Score