To calculate polygenic risk scores (PRS) from genotype results, especially for diseases like Cardiovascular or Pancreatic cancer without phenotype data, you typically need summary statistics from Genome-Wide Association Studies (GWAS). Since you only have genotype results and no phenotype data, directly calculating a PRS for your two individuals might be limited. However, you can still utilize GWAS summary data from other studies. Here’s how:
Understand the Limitations: With only two samples, any calculated score will lack statistical power and generalizability. It's crucial to acknowledge these limitations.
Utilize External GWAS Summaries: You can incorporate summary statistics from large-scale GWAS for Cardiovascular or Pancreatic cancer. These are readily available in public databases like the GWAS catalog (https://www.gwascatalog.org/) and dbGaP (https://www.ncbi.nlm.nih.gov/gap/). You'll need to download these summary statistics, which usually contain:
- SNP IDs
- Effect alleles
- Beta coefficients (effect sizes)
- Standard errors
Prepare Your Genotype Data: Ensure your genotype data is properly formatted. This typically involves a list of SNPs with their genotypes for each individual. You might need to convert your current format (Rs ID, Chr name and position, Genotype) into a standard format like VCF or PLINK PED/MAP.
Choose a PRS Calculation Tool: Tools like PRSice-2
(https://github.com/SVNHPRSCalculationTools) are designed for PRS analysis and can handle both summary statistics and genotype data. LDpred
is another useful tool for imputing effect sizes if needed, especially when you lack Beta values directly.
Perform the Calculation: Using a tool like PRSice-2
, you would input your genotype data (or prepare it to be compatible with the tool's input requirements) and the external GWAS summary statistics. The tool will calculate PRS for each individual based on the summarized risks from the GWAS.
Interpretation: Be cautious when interpreting the PRS scores. Given that you only have two samples, the scores are likely to be very noisy and not representative of true genetic predisposition in a broader context. These scores should be used more as exploratory data or for illustrative purposes rather than definitive conclusions.
Consider Alternative Approaches: If your goal is to understand the genetic architecture of the disease, consider methods beyond simple PRS calculation. You might explore:
- LD score regression (https://www.nature.com/articles/nrg3491) to estimate effect sizes and polygenic architecture.
- GWAS in your own sample (if feasible): If you have more samples, perform a GWAS yourself and then calculate PRS using summary statistics from your GWAS or external sources.
In summary, while calculating a robust PRS for just two individuals is challenging, you can leverage external GWAS summary data to get an initial estimate. However, interpret the results cautiously due to the small sample size and lack of phenotype information. Focus on exploratory insights rather than definitive conclusions.
If you don't have phenotype data, consider this approach:
- Summarize external GWAS data: Download summary statistics from large-scale GWAS for your target disease (Cardiovascular or Pancreatic cancer). These datasets are publicly available and contain information on effect sizes (Betas) for many SNPs.
- Calculate PRS: Use the downloaded summary statistics and calculate the PRS for your two individuals based on their genotypes. This will give you an estimate of their polygenic risk.
Keep in mind that without phenotype data, the PRS will be less reliable and more prone to noise. However, this approach can still provide valuable insights into the potential genetic predisposition of your samples.
By using external GWAS summary data, you can still derive some information about polygenic risk even with limited phenotype data. However, always interpret the results cautiously and consider the limitations of the analysis.
The number of samples required for a reliable PRS calculation depends on the target disease and the summary statistics you are using.
For common diseases, large sample sizes (e.g., hundreds to thousands) are typically needed to achieve sufficient power. For less common or rare diseases, smaller sample sizes might be more appropriate.
In your case, with only two samples, the PRS will likely have low statistical power and may not be representative of the true genetic predisposition. However, if you still proceed, ensure that:
- Use robust summary statistics: Download summary statistics from high-quality GWAS studies to maximize the reliability of your score.
- Interpret cautiously: Recognize that the PRS will be less accurate and more prone to noise due to the small sample size.
By carefully considering these factors, you can still derive some meaningful insights, although the results should be treated as preliminary or exploratory.epromotion
.AutoField Check
How many samples do you have? for both case and control?
However, you need to calculate sample amount before conducting the PRS. PRS is just a method, but it depends on your sample size.
Hope it helps!
Thank you so much for replying,
Actually i have only genotype results of two samples, along with SNP name, Rs ID, Chr number and genomic position. The actual task given to me was to calculate Polygenic risk score and to check whether the people are susceptible to the particular disease or not based on PRS.
I don't have any Phenotype data which was required by PLINK, So using other GWAS studies for getting effect allele and Beta value is a right way? Please let me know that am I going in any wrong direction in this.
If your task is only to determine whether people are susceptible to a particular disease based on SNPs, the odds ratio (OR) has already given you an answer. OR interpretation is easier than PRS, I think, particularly for preliminary study.
Yes, you may use other studies to obtain desired SNPs, but I am not sure if you can use a similar beta value. You are not going in the wrong direction, but you need to ensure your current method is in accordance with your design experiment.
Hope it helps!