I'm very new in the GWAS and PRS analsysis, so my question is simple but I cannot find a straightforward answer anywhere:
Is it possible, with a comprehensive database of risk scores associated with traits, to calculate polygenic risk scores for a specific genome? By this, I mean if I can "diagnose" a genome for all diseases already studied by previous genome wide studies.
I'd assume it's not that simple - if that was the case, every paper would reference it.
Technically, it is possible, by one doing the following:
constructing predictive models using all statistically significant
GWAS hits for each condition / phenotype
cross-validating and refining the models on training and testing data
making model predictions on new data
Some extra points to consider:
statistically significant GWAS hits may not necessarily result in
disease or confer a particular phenotype; instead they may only
increase / decrease risk (that is, to say, that many of these variants have incomplete penetrance)
getting samples to do this work will be difficult
'polygenic risk score' is a generic term and there are many ways to
construct these. Most are built from the beta coefficient from the
regression model fit
you should consider how you are going to build and fit the model. Perhaps
something along the lines of elastic-net or ridge regression would be
a start. Others have use lasso-penalised regression, in the past, to
do something similar for breast cancer somatic variants.
Note, that, replacing 'predictive models' with 'AI' or 'machine learning algorithm' will likely increase your chance of funding for the work, if that is ultimately what you want.
You can also look into our tutorial. However, I guess what you are asking is slightly different, in that you already got PRS associated with disease and you've got a new genome that you want to calculate the Score on. For that, you'll need to know what SNPs were used for the construction and what the weights (this are usually beta-coefficient from GWAS, either used as is (e.g. PRSice), or regularized / shrinked (e.g. LDpred, lassosum, PRS-CS etc). Once you've both information, you'll be able to re-calculate the score.
Thank you Sam. So, it seems that I can achieve that with GWAS catalog since a collection of different GWAS are present, and most of the SNPs have a beta-coefficient associated with them.
Yes, you can, but beware that using only the significant SNPs tends to generate underpowered PRS and if the study of interest use SNPs that are outside of the genome wide significance threshold, then it is likely that you won't have the information required to regenerate the score
So, it's not as straightforward as simply calculating the PRS based on our genome's mapped SNPs, I see.
Thank you for the comprehensive answer!
Ah, if you want a more automated way to do it, then I would recommend taking a look at PRsice by Sam
Thank you for both of your answers
You can also look into our tutorial. However, I guess what you are asking is slightly different, in that you already got PRS associated with disease and you've got a new genome that you want to calculate the Score on. For that, you'll need to know what SNPs were used for the construction and what the weights (this are usually beta-coefficient from GWAS, either used as is (e.g. PRSice), or regularized / shrinked (e.g. LDpred, lassosum, PRS-CS etc). Once you've both information, you'll be able to re-calculate the score.
Thank you Sam. So, it seems that I can achieve that with GWAS catalog since a collection of different GWAS are present, and most of the SNPs have a beta-coefficient associated with them.
Yes, you can, but beware that using only the significant SNPs tends to generate underpowered PRS and if the study of interest use SNPs that are outside of the genome wide significance threshold, then it is likely that you won't have the information required to regenerate the score