I am trying to use the PRSice package to calculate a polygenic risk score for my group of samples. My input files are .imputed files (generated with IMPUTE2) that has been modified with bash to have dosage data.
In other words, instead of looking like this (3 probabilities per SNP: AA,AB,BB)
SNP1 rs1 1000 A C 1 0 0 1 0 0
SNP2 rs2 2000 G T 1 0 0 0 1 0
SNP3 rs3 3000 C T 1 0 0 0 1 0
SNP4 rs4 4000 C T 0 1 0 0 1 0
SNP5 rs5 5000 A G 0 1 0 0 0 1
My .imputed actually look like this (1 value/column per SNP per individual)
SNP1 rs1 1000 A C 0 0
SNP2 rs2 2000 G T 0 1
SNP3 rs3 3000 C T 0 1
SNP4 rs4 4000 C T 1 1
SNP5 rs5 5000 A G 1 2
Calculation used to obtain dosage = [0 * p(AA)] + [1 * p(AB)] + [2 * p(BB)]) Actual command in bash:
cat file.imputed | awk '{printf $1"\t"$2"\t"$3"\t"$4"\t"$5; for(i=6; i<nf; i+="3)" {if($(i+0)="=" 0="" &&="" $(i+1)="=" 0="" &&="" $(i+2)="=" 0)="" printf="" "\tna";="" else="" printf="" "\t"$(i+0)*0+$(i+1)*1+$(i+2)*2};="" printf="" "\n"}'="" >="" file_dosages.imputed<="" p="">
I also have a .sample file which has phenotypes. The format is:
ID_1 ID_2 missing father mother sex pheno cov1 cov2 cov3 cov4 cv5
0 0 0 D D D B D D D D D
sample1 sample1 0 0 0 0 0 0 2 2 1 3
sample2 sample2 0 0 0 0 1 0 0 1 0 3
sample3 sample3 0 0 0 0 0 0 4 999 0 3
sample4 sample4 0 0 0 0 0 0 4 0 1 2
My command in PRSice is:
R --file=/Users/bob/Downloads/PRSice_v1.25/PRSice_v1.25.R -q --args \
plink /Users/bob/Downloads/plink-1.07-mac-intel/plink \
base ../../ORs.txt\
target ../../file_dosages.imputed \
dosage T \
dos.format 1 \
dos.impute2 T \
dos.sep.fam ../../Pheno_files_for_PRSice/file.sample \
dos.fam.is.samp T \
slower 0.000000001 \
sinc 0.01 \
supper 0.5 \
covary F \
clump.snps F \
report.individual.scores T
However, I get the following error: Error in file(file, "rt") : cannot open the connection Calls: read.table -> file Execution halted
The PROFILES.log file says: Reading dosage information from [ ../../file_dosages.imputed ] Format set to three genotype probabilities Writing results to [ PROFILES.assoc.dosage ]
I have the impression PRSice is not understanding that my dosage format is 1 SNP per individual, not "Format set to three genotype probabilities". I don't understand why my command doesn't work. Any help would be much appreciated.