Question

Polygenic Risk Score Calculation: Do we need to apply the same p-value threshold on all 22 chromsomes?

0

Entering edit mode

2.9 years ago

Mengna Zhang ▴ 10

Hi there,

I am using PRSice-2 to calculate the polygenic risk score for 22 chromosomes one by one. To my understanding, since the 22 chromosomes are independent of each other, and our goal is to find the best set of SNPs for chromosomes and then merge them as the final best "representative" SNPs for the phenotype. Thus, we can use different C+T thresholds on different chromosomes to achieve our goal, right?

For example, my pheno type is height, on chromosome 1, PRSice-2 gave me the best set of SNPs (snp1, snp2, snp3) with C+T threshold: r2 = 0.01, p-value threshold = 0.001;
on chrmosome 2, PRSice-2 gave me the best set of SNP (snp4, snp5) with C+T threshold: r2 = 0.01, p-value threshold = 0.01;

Can I then report that snp1,snp2,snp3,snp4, and snp5 are associated with height? Do I need to apply the same C+T threshold on every chromosome?

PRS PRSice-2 • 2.0k views

ADD COMMENT • link updated 2.9 years ago by Sam ★ 4.8k • written 2.9 years ago by Mengna Zhang ▴ 10

score 3 · Accepted Answer · 2022-01-03

While it is faster to do the per-chromosome calculation, there are a number of gotcha that might invalidate your results. Perhaps the #1 problem of this approach is that PRSice's default is --score avg, which divides the PRS by the number of allele used for calculating the PRS (which helps to account for individual genotype missingness), as such you cannot reliably add up the individual PRS to generate a genome wide score. For that you need to use --score sum which can be affected by individual genotype missingness.

Once you have handled that, you can use --all-scores to generate PRS for all samples for all p-value thresholds for all chromosome. And then you can add up the PRS for each p-value threshold, and perform the required regression to identify the best threshold.

The main problem of this per-chromosome approach is that while you do speed up the analysis by parallelize across chromosome, it significantly increase the potential of having an error. In fact, in the latest version of PRSice-2 (v 2.3.5), if you can use multi-threading, I don't think doing by chromosome then merge will give you any speed advantage over doing it with PRSice directly unless you are doing imputed data, which because of a bug, we might require way more memory than possible when performing clumping.

Hope this help