Entering edit mode
10 months ago
Patrick
▴
10
I am calculating the PRS from a single sample in a vcf file.
I got the sample by filtering it out of a large dataset from 1000genomes, so there are many variants with 0/0 in it.
My question is now: Can I somehow remove all variants with 0/0 and does removing them affect the PRS score I calculate? Even though its only 1 sample in the file, it is very large which makes it difficult to work with and is possibly even responsible for some errors I am encountering during PRS calculation. (That's why I wanted to remove the variants with 0/0)
Dear Patrick, please elaborate on what is your 'PRS calculation'. Only then can we assist. Please share relevant code and/or programs that you are using.
I am using the tool 'pgsc_calc' to calculate PRS scores for individuals using the PGS-Catalog.
In terms of code, I don't really have any code since its just a command line tool where you input a vcf file and the PGS-Catalog you want to use and get the PRS for all samples in that vcf. In my case, I am only gonna have 1 sample in the vcf file so 1 PRS score should be calculated
If all of your case/controls are homozygous normal in that site removing the site wouldn't affect the overall scores since it would be 0. However, all the score would shift to larger values if you're dividing the sum by number of non-missing sites like in Plink.
I had this table where you can play around:
https://docs.google.com/spreadsheets/d/1Vm3fAb4TDFOMJOEobX-Ou-tjGliA2Sh65lw1xWQ3RKQ/edit#gid=0