For easy answering, this is from the Supplementary:
Using the IGAP Stage 1 sample, we first identified a list of SNPs associated with increased risk for AD, using a significance threshold of p < 10-5. Next, we evaluated all IGAP-detected, AD-associated SNPs within the ADGC Phase 1 case-control dataset. Using a stepwise procedure in survival analysis, we delineated the ‘final’ list of SNPs for constructing the polygenic hazard score. 12-13 Specifically, using Cox proportional hazard models, we identified the top AD-associated SNPs within the ADGC Phase 1 cohort (excluding NIA ADC and ADNI samples), while controlling for the effects of gender, APOE variants, and top five genetic principal components (to control for the effects of population stratification). We utilized age of AD onset and age of last clinical visit to estimate ‘age appropriate’ hazards 14 and derived a PHS for each participant. In each step of the stepwise procedure, the algorithm selected one SNP from the pool that most improved model prediction (i.e. minimizing the Martingale residuals); additional SNP inclusion that did not further minimize the residuals resulted in halting of the SNP selection process. To prevent over-fitting in this training step, we used 1000x bootstrapping for model averaging and estimating the hazard ratios for each selected SNPs. We assessed the proportional hazard assumption in the final model using graphical comparisons.
To assess for replication, we first examined whether the ADGC Phase 1 derived predicted PHSs could stratify individuals into different risk strata within the ADGC Phase 2 cohort. We next evaluated the relationship between predicted age of AD onset and the empirical/actual age of AD onset using cases from ADGC Phase 2. We binned risk strata into percentile bins and calculated the mean of actual age in that percentile as the empirical age of AD onset. In a similar fashion, we additionally tested replication within the NACC subset classified at autopsy as having a high level of AD neuropathologic change. 25
Because case-control samples cannot provide the proper baseline hazard, 16 we used the previously reported annualized incidence rates by age, estimated from the general United States of America (US) population. 17 For each participant, by combining the overall population-derived incidence rates 17 and genotype-derived PHS, we calculated an individual’s ‘instantaneous risk’ for developing AD, based on their genotype and age (for additional details see Supporting Information). To independently assess the predicted instantaneous risk, we evaluated longitudinal follow-up data from 2,724 cognitively normal older individuals from the NIA ADC with at least 2 years of clinical follow-up. We assessed the number of cognitively normal individuals progressing to AD as a function of the predicted PHS risk strata and examined whether the predicted PHS-derived incidence rate reflects the empirical/actual progression rate using a Cochran-Armitage trend test.
We examined the association between our PHS and established in vivo and pathologic markers of AD neurodegeneration. Using linear models, we assessed whether the PHS associated with Braak stage for NFTs and CERAD score for neuritic plaques as well as CSF A1-42, and CSF total tau. Using linear mixed effects models, we also investigated whether the PHS was associated with longitudinal CDR-SB score and volume loss within the entorhinal cortex and hippocampus. In all analyses, we co-varied for the effects of age and sex.
- Yang J, Ferreira T, Morris AP, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 2012;44:369-U170.
- Dudbridge F. Power and Predictive Accuracy of Polygenic Risk Scores. Plos Genet 2013;9.
Thanks!
I have been studying this paper but haven't figure out how to calculate that.
Would you explain this in simple English or find a equation or something?
It is too difficult to understand for a non-native English speaker who has no statistical background.
Thank you!