Dear community members,
I am trying to adapt a PRS score from arrays (famous 313 markers breast cancer PRS) to thousands of WGS data.
There were 5 array variants which were not found in VCF calls - I found linked variants or fixed the notation.
But there are 6-10 other variants where the allele frequency is very different between observed in WGS and observed in arrays (the population is EUR, even though different countries, 313 PRS score was partially built using UK Biobank data). E.g. the variant is found with 0.6 AF in arrays and 0.4 in WGS. Other 307-303 variants in WGS are just approx the same as in arrays (variability is as big as expected from the cohort sizes).
What can be the reason? How can I fix this? Should I replace these variants with some others which are in strong LD with the "off" variants? Is there an offline tool to do this? (I use LDProxy so far but it is not suitable for automated analysis).
UPD: 0.6 and 0.4 are kinda coincidence, it can not be solved with simple "flipping" the variants frequencies.