Hello I am running PRSice 2
But there are many variants info which is not included during the process
My code:
Rscript PRSice.R\
--prsice PRSice_linux\
--dir /home/projects/base-files/T2test/PRSice\
--base T2baseNoBMI.uniq.txt\
--target contn.analysis.PRS#\
--pheno T2.train.control60.txt\
--cov covariates.cont60.txt\
--binary-target T\
--snp rsID\
--chr CHR\
--bp BP\
--A1 A1\
--A2 A2\
--stat BETA\
--pvalue Pvalue\
--extract PRSice.valid\
--print-snp\
--score sum\
--out PRSice-res\
My log file suggest that 1476197 SNPs are not being included,
5965221 variant(s) observed in base file, with:
1310476 variant(s) excluded based on user input
4654745 total variant(s) included from base file
Loading Genotype info from target
==================================================
187786 people (86770 male(s), 101016 female(s)) observed
187786 founder(s) included
1476197 variant(s) not found in previous data
226 variant(s) with mismatch information
4654745 variant(s) included
The first column in my base file has effect allele column A1 and second is reference allele A2. In the target data which is in the form of bed bim fam, the first column is ref allele and the second one effect allele. Does this causing the exclusion of so many SNPs ? How to fix this
Or is there something else I am missing
Thanks a lot
Ok, so you think excluding 1.5 million varaint wont make much difference since I already have almost 4.5 million varaint considered by PRSice. Actually, I considered external GWAS that has only chromosome and base position columns. In order to get rsIDs I matched base positions from my target data with rsIDs (from UKBB). So, base and target should have same number of matching rsIDs but I am still confused why PRSice is unable to process milllion of SNPs in my analysis.
So your base data has chr:bp format, and your target has rsid. from the number of SNPs. You can check the overlap between the two using R
and see what numbers you are getting. I am assuming your rsID column in your base is the original ID.