Hello,
I'm new to gwas study and I just came across the question while reading this tutorial paper.
PRS distribution
The central limit theorem dictates that if a PRS is based on a sum of independent variables (here, SNPs) with identical distributions, then the PRS of a sample should approximate the normal (Gaussian) distribution. This is true even if the PRS has extremely low predictive accuracy, since the sum of random numbers is approximately normally distributed, and so a normally distributed PRS in a sample should not be considered as validation of the accuracy of a PRS or of the liability threshold model. However, strong violations of these assumptions, such as the use of many correlated SNPs or a sample of heterogenous ancestry (thus, SNPs with markedly different genotype distributions), can lead to non-normal PRS distributions. Thus, inspection of PRS distributions may highlight calculation errors or problems of population stratification in the target sample for which researchers did not adequately control.
It says PRS distribution usually follow the Faussian distribution but I wonder why it does so. If target data consists of two phenotype group which can be nicely distinguished by PRS, I think the PRS distribution in target data can seems like mixture of two Gaussian distribution.
Please someone explain me if I get it wrong.
Thank you.
I'm not following your question. just because you have two phenotype groups in mind (e.g. professional basketball players vs others) doesn't mean the PRS for height isn't normally distributed for people in general.
When it comes to real world case, you are right. But the problem is, if i have target data which contain 1000 professional basketball player and 1000 others, the PRS(of height) distribution must seems bimodal distribution. (if PRS can obviously distinguish the phenotype)