Hi there,
I am using PRSice-2 to calculate the polygenic risk score for 22 chromosomes one by one. To my understanding, since the 22 chromosomes are independent of each other, and our goal is to find the best set of SNPs for chromosomes and then merge them as the final best "representative" SNPs for the phenotype. Thus, we can use different C+T thresholds on different chromosomes to achieve our goal, right?
For example, my pheno type is height,
on chromosome 1, PRSice-2 gave me the best set of SNPs (snp1, snp2, snp3) with C+T threshold: r2 = 0.01, p-value threshold = 0.001;
on chrmosome 2, PRSice-2 gave me the best set of SNP (snp4, snp5) with C+T threshold: r2 = 0.01, p-value threshold = 0.01;
Can I then report that snp1,snp2,snp3,snp4, and snp5 are associated with height? Do I need to apply the same C+T threshold on every chromosome?
Thank you, Sam! So what you said "And then you can add up the PRS for each p-value threshold, and perform the required regression to identify the best threshold." means that the best p-value threshold must be the same for all chromosomes?
I did use --score sum to calculate the PRS. I chose to calculate the PRS by chromosome is because the data I was given was organized by chromosome and the data size is very big. With --score sum, why can't I apply different p-value thresholds for different chromosomes? Since they are independent of each other, can I merge PRS on chr1 with p-value threshold 0.000001 and PRS on chr2 with p-value threshold 0.001 together?
In theory, you can do that, but the interpretation will be more difficult. Natively, PRSice support per-chromosome input:
--target chr#
if your input are organized in chr1, chr2 etc.