PRSice and R lm() show different association analysis output (coefficient, SE and P)
0
0
Entering edit mode
9 months ago
Salahuddin • 0

Hey, I am struggling to understand Why am I getting different association analysis output (coefficient, SE and P) between PRSice and lm() function in R. Not only the values, even the direction of association is different. I am trying to do association analysis between PRS of my trait and MRI data in UKB. Feeding PRSice the standardized phenotype data.  

PRS PRSice2 UKBB • 849 views
ADD COMMENT
0
Entering edit mode

Fraction of output from PRSice:

Pheno   Coefficient Standard.Error  P   Threshold
Zcortical_vol2  0.447657    0.573909    0.435388    0.001
Zcortical_vol2  1.17785 1.46062 0.420018    0.01
Zcortical_vol2  1.34621 3.77899 0.721667    0.1
Zcortical_vol2  5.35868 5.51758 0.331455    0.25
Zcortical_vol2  8.51688 7.31505 0.244313    0.5
Zcortical_vol2  8.13715 8.40917 0.333226    0.75
Zcortical_vol2  7.3802  9.07108 0.415883    1

Fraction of out put from R using lm() function:

    Beta    SE  P value Threshold
cortical_vol2   -121    211.49  0.5672  0.001
cortical_vol2   75.2    214.63  0.7261  0.01
cortical_vol2   67.02   225.98  0.7668  0.1
cortical_vol2   5.16    228.77  0.982   0.25
cortical_vol2   -85.61  229.97  0.7097  0.5
cortical_vol2   -85.86  231.22  0.7104  0.75
cortical_vol2   -82.1   231.09  0.7224  1

My script for association analysis in PRSice where I used previously clumped PRS:

PRSice 2.3.5 (2021-09-20) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2024-02-06 09:38:16
D:\ukb_genetic_data\PRSice_win64.exe \
    --a1 A1 \
    --a2 A2 \
    --allow-inter  \
    --bar-levels 0.001,0.01,0.1,0.25,0.5,0.75,1 \
    --base iPSYCH-PGC_ASD_Nov2017.gz \
    --base-info INFO:0.9 \
    --beta  \
    --binary-target 46F \
    --bp BP \
    --chr CHR \
    --cov mig_asd_cov \
    --cov-col scan_x_pos,scan_y_pos,scan_z_pos,Zage,Zsex,Zbatch,@Zpc[1-20] \
    --cov-factor Zsex,Zbatch \
    --extract ukb_distribution_011023.snp \
    --fastscore  \
    --geno 0.02 \
    --ignore-fid  \
    --no-clump  \
    --num-auto 22 \
    --out mri_asd_run_060224 \
    --pheno cerebelum_pheno_060224 \
    --pheno-col eid,Zcortical_vol2,ZCSF_vol2,ZGM_vol2,ZWM_vol2,ZTBV2,ZWMH_vol2,ZBrainStem_vol2,ZI_IVCerebellum_L_vol2,ZI_IVCerebellum_R_vol2,ZVCerebellum_L_vol2,ZVCerebellum_R_vol2,ZVICerebellum_L_vol2,ZVermisVICerebellum_vol2,ZVICerebellum_R_vol2,ZCrusICerebellum_L_vol2,ZVermisCrusICerebellum_vol2,ZCrusICerebellum_R_vol2,ZCrusIICerebellum_L_vol2,ZVermisCrusIICerebellum_vol2,ZCrusIICerebellum_R_vol2,ZVIIbCerebellum_L_vol2,ZVermisVIIbCerebellum_vol2,ZVIIbCerebellum_R_vol2,ZVIIIaCerebellum_L_vol2,ZVermisVIIIaCerebellum_vol2,ZVIIIaCerebellum_R_vol2,ZVIIIbCerebellum_L_vol2,ZVermisVIIIbCerebellum_vol2,ZVIIIbCerebellum_R_vol2,ZIXCerebellum_L_vol2,ZVermisIXCerebellum_vol2,ZIXCerebellum_R_vol2,ZXCerebellum_L_vol2,ZVermisXCerebellum_vol2,ZXCerebellum_R_vol2,ZI_IVCerebellum_vol2,ZVCerebellum_vol2,ZVICerebellum_vol2,ZVIIbCerebellum_vol2,ZVIIIaCerebellum_vol2,ZVIIIbCerebellum_vol2,ZIXCerebellum_vol2,ZXCerebellum_vol2,ZCrusICerebellum_vol2,ZCrusIICerebellum_vol2 \
    --print-snp  \
    --pvalue P \
    --seed 1435821109 \
    --snp SNP \
    --stat OR \
    --target ukb_imp_chr#_v3,ukb30172_imp_chr22_v3_s487324.sample \
    --thread 1 \
    --type bgen
ADD REPLY
1
Entering edit mode

Hi please only use one platform to ask question. What code did you use in R? One common reason why the results can differ might be how R handle some of the covariates (e.g. if e.g. Zbatch is numeric, PRSice will treat it as a factor because it was provided in cov-factor, but R will treat that as numeric automatically)

ADD REPLY
0
Entering edit mode

Thanks for your reply. That makes sense. I will give a try by making changes in covariates class.

However, my command for R has been,

lm (cortical_vol2 ~ Pt_0.001 + scan_x_pos + scan_y_pos + scan_z_pos + Zage + Zsex + Zbatch + Zpc1 + Zpc2 + Zpc3 + Zpc4 + Zpc5 + Zpc6 + Zpc7 + Zpc8 + Zpc9 + Zpc10 + Zpc11 + Zpc12 + Zpc13 + Zpc14 + Zpc15 + Zpc16 + Zpc17 + Zpc18 + Zpc19 + Zpc20, data = data1)

After trying correcting the PRSice command, still the outputs are different. Your suggesting highly appreciated. Output with PRSice association -

Pheno   Set Threshold   R2  P   Coefficient Standard.Error  Num_SNP
Zcortical_vol2  Base    0.5 3.66E-05    0.173625    0.00825 0.00606556  97903

Output with R association -

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.1025502  0.0047043 -21.799  < 2e-16 ***
Pt_0.5      -0.0020045  0.0049649  -0.404   0.6864

PRSice call log -

PRSice 2.3.5 (2021-09-20) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2024-02-12 09:13:27
D:\ukb_genetic_data\PRSice_win64.exe \
    --a1 A1 \
    --a2 A2 \
    --allow-inter  \
    --bar-levels 0.001,0.01,0.1,0.25,0.5,0.75,1 \
    --base iPSYCH-PGC_ASD_Nov2017.gz \
    --base-info INFO:0.9 \
    --beta  \
    --binary-target 5F \
    --bp BP \
    --chr CHR \
    --cov trial_cov \
    --cov-col Zage,Zsex,Zbatch,@Zpc[1-15],Zscan_x_pos,Zscan_y_pos,Zscan_z_pos \
    --extract ukb_distribution_011023.snp \
    --fastscore  \
    --geno 0.02 \
    --ignore-fid  \
    --no-clump  \
    --num-auto 22 \
    --out mri_asd_run_110224 \
    --pheno trial_pheno \
    --pheno-col Zcortical_vol2,ZCSF_vol2,ZGM_vol2,ZWM_vol2,ZTBV2 \
    --print-snp  \
    --pvalue P \
    --score std \
    --seed 1214445140 \
    --snp SNP \
    --stat OR \
    --target ukb_imp_chr#_v3,ukb30172_imp_chr22_v3_s487324.sample \
    --thread 1 \
    --type bgen

Command in R :

Call:

   lm(formula = Zcortical_vol2 ~ Pt_0.5 + Zscan_x_pos + Zscan_y_pos + 
        Zscan_z_pos + Zage + Zsex + Zbatch + Zpc1 + Zpc2 + Zpc3 + 
        Zpc4 + Zpc5 + Zpc6 + Zpc7 + Zpc8 + Zpc9 + Zpc10 + Zpc11 + 
        Zpc12 + Zpc13 + Zpc14 + Zpc15, data = data)
ADD REPLY
0
Entering edit mode

without the data, it is not possible for me to determine the reason of the error. Another likely possibility is the sample missingness (e.g. if you check the best file output from PRSice, it should have a column call In_Regression, only those with Yes were included)

ADD REPLY

Login before adding your answer.

Traffic: 1651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6