Question

How to calculate polygenic hazard score (PHS)?

2

Entering edit mode

7.6 years ago

mbk0asis ▴ 700

Hi.

A recently published paper in PLOS medicine reported that they had developed a polygenic hazard score (PHS) for quantifying individual differences in age-specific genetic risk for Alzheimer disease (AD) with hazard ratios of disease associated 33 SNPs.

Later in that paper, they predicted the age of AD onset using this PHS, but I couldn't find how they calculated it due to the lack of knowledge in statistics.

Can someone show me how to use the hazard ratios to predict the onset age?

Sorry for poor English.

Thank you!

plygenic hazard score Alzheimer statistics • 3.9k views

ADD COMMENT • link updated 6.9 years ago by jorma.o.jormakka • 0 • written 7.6 years ago by mbk0asis ▴ 700

score 0 · Answer 1 · 2017-04-10

For easy answering, this is from the Supplementary: Using the IGAP Stage 1 sample, we first identified a list of SNPs associated with increased risk for AD, using a significance threshold of p < 10-5. Next, we evaluated all IGAP-detected, AD-associated SNPs within the ADGC Phase 1 case-control dataset. Using a stepwise procedure in survival analysis, we delineated the ‘final’ list of SNPs for constructing the polygenic hazard score. 12-13 Specifically, using Cox proportional hazard models, we identified the top AD-associated SNPs within the ADGC Phase 1 cohort (excluding NIA ADC and ADNI samples), while controlling for the effects of gender, APOE variants, and top five genetic principal components (to control for the effects of population stratification). We utilized age of AD onset and age of last clinical visit to estimate ‘age appropriate’ hazards 14 and derived a PHS for each participant. In each step of the stepwise procedure, the algorithm selected one SNP from the pool that most improved model prediction (i.e. minimizing the Martingale residuals); additional SNP inclusion that did not further minimize the residuals resulted in halting of the SNP selection process. To prevent over-fitting in this training step, we used 1000x bootstrapping for model averaging and estimating the hazard ratios for each selected SNPs. We assessed the proportional hazard assumption in the final model using graphical comparisons.

To assess for replication, we first examined whether the ADGC Phase 1 derived predicted PHSs could stratify individuals into different risk strata within the ADGC Phase 2 cohort. We next evaluated the relationship between predicted age of AD onset and the empirical/actual age of AD onset using cases from ADGC Phase 2. We binned risk strata into percentile bins and calculated the mean of actual age in that percentile as the empirical age of AD onset. In a similar fashion, we additionally tested replication within the NACC subset classified at autopsy as having a high level of AD neuropathologic change. 25

Because case-control samples cannot provide the proper baseline hazard, 16 we used the previously reported annualized incidence rates by age, estimated from the general United States of America (US) population. 17 For each participant, by combining the overall population-derived incidence rates 17 and genotype-derived PHS, we calculated an individual’s ‘instantaneous risk’ for developing AD, based on their genotype and age (for additional details see Supporting Information). To independently assess the predicted instantaneous risk, we evaluated longitudinal follow-up data from 2,724 cognitively normal older individuals from the NIA ADC with at least 2 years of clinical follow-up. We assessed the number of cognitively normal individuals progressing to AD as a function of the predicted PHS risk strata and examined whether the predicted PHS-derived incidence rate reflects the empirical/actual progression rate using a Cochran-Armitage trend test.

We examined the association between our PHS and established in vivo and pathologic markers of AD neurodegeneration. Using linear models, we assessed whether the PHS associated with Braak stage for NFTs and CERAD score for neuritic plaques as well as CSF A1-42, and CSF total tau. Using linear mixed effects models, we also investigated whether the PHS was associated with longitudinal CDR-SB score and volume loss within the entorhinal cortex and hippocampus. In all analyses, we co-varied for the effects of age and sex.

Yang J, Ferreira T, Morris AP, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 2012;44:369-U170.
Dudbridge F. Power and Predictive Accuracy of Polygenic Risk Scores. Plos Genet 2013;9.

score 0 · Answer 2 · 2017-12-14

I think PHS is the same as Polygenic Score PGS and that is an additive measure how much SNP (allele in a gene) corresponds to the phenotype, here the disease. You first select a set of SNPs and evaluate how often a SNP is associated with the phenotype. This is the frequency. They you multiply this frequency by the estimated effect, like small effect, large effect, you must select a number. The frequency times the effect is the weight beta_j for the SNP_j, j=1,..,n. Then you take the population of individuals_i. For each individual´_i you calculate the Polygenic Score as a sum PS_i=(1/n)sum_j beta_j x_ij where x_ij describes the allele the person has. One could use three values for x_ij, 0,1,2 , so the allele 0 would have no effect on the phenotype, 2 quite much. If you want the PGS for the population, sum over i, and you would like to normalize the values, for instance to average zero, SD=1. I do not know if this is what the article says they were doing, but this is how PGS can be calculated and it is in plain English by a non-native speaker.