Hi there,
I am trying to calculate polygenic risk score (PRS/PGS) for a given set of individuals (it can be VCF, bgen file, it does not matter) using pre-computed scores (beta values for each variant) from the PGS Catalog. So let's say for example that I want to calculate the PGS000001 PRS score for my specific individuals using PGS000001 computed scores from the PGS catalog. My intention is just to calculate the score for the given individuals, is not to develop new PGS scores or validate existing ones.
I have two questions:
- Should I use imputation data, or just the variants that have been actually called (after germline calling)? Do I gain anything by imputing the data?
- Should I filter the variants or individuals? For example with filters such as missing call rates (
--geno
inplink2
),--indep-pairwise
,--hwe
,--maf
. The same question, should I discard any individual based on some criteria? I know that when one is developing PGS, a strict consideration of the SNPs and sample properties must be taken into account, but is it the same when just trying to calculate scores using pre-calculated Beta values deposited in the PGS Catalog?
Thank you!