Question

How to estimate polygenic risk score (PRSs) using the scoring files from PGSCatalog for one individual?

3

Entering edit mode

3.4 years ago

Alejandro Rojo ▴ 30

Hi all,

I have an annotated vcf file for one individual which I want to estimate his polygenic risk score (PRS) for a certain trait, using the scoring files from the PGSCatalog. The scoring file contains the SNP ID, reference and alternative allele, and weights. How can I estimate the PRS using the scoring file without using the classical approach of having GWAS summary data, target data, etc...?

Thank you very much for your consideration.

prs genome snp • 4.6k views

ADD COMMENT • link updated 3.0 years ago by lassefolkersen ▴ 50 • written 3.4 years ago by Alejandro Rojo ▴ 30

score 1 · Answer 1 · 2021-06-24

1

Entering edit mode

3.4 years ago

zx8754 12k

PGSCatalog has all you need to calculate the score, if individual has the effect allele then multiply it with beta, do the same for all SNPs, then sum.

ADD COMMENT • link 3.4 years ago by zx8754 12k

0

Entering edit mode

Thank you very much for your answer. One last question. Since the allele represent the single point mutation, should I use dummy coding to transform the nucleotides and then perform the multiplication with the betas or is there another approach?

ADD REPLY • link 3.4 years ago by Alejandro Rojo ▴ 30

1

Entering edit mode

Yes, if effect allele is "A" and genotype is "A A", then 2 * beta

ADD REPLY • link 3.4 years ago by zx8754 12k

0

Entering edit mode

Does anyone have any software or script that performs these calculations?

ADD REPLY • link 3.2 years ago by Leandro • 0

1

Entering edit mode

It is a one-liner, multiply genotypes with coefficients and sum them

ADD REPLY • link 3.2 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Maybe try PRSice-2: https://www.prsice.info/

ADD REPLY • link 3.2 years ago by zx8754 12k

score 0 · Answer 2 · 2021-11-07

Hi, there's a new nextflow module, imputeme, that can do that at NF-core. It's for exactly your use case, and I believe it handles the key things asked here. I disagree that it is a "one-liner" as some comments suggests, for several reasons - a main one being that OP has an annotated vcf file, and vcf files are empty at positions that are not homozygote reference, whereas PGS catalog data does not necessarily have effect allele matched to ref and alt notations. Oh and don't let the name trick you, when inputting whole genome sequence data, no imputation takes place. That's just for the microarray based inputs. Here's the link, it should fit right into any nextflow pipeline: https://github.com/nf-core/modules/tree/master/modules/imputeme/vcftoprs