Entering edit mode
2.9 years ago
optimistsso4co3
▴
130
By comparing my plink frequencies with reference panel i get the following graph:
As you can see, vast majority of genotypes align well with reference, however, some are directly inverted. I have already excluded all variants with mismatched genotypes (strand flip & allele switch).
What is the correct way to fix it? I could think of just remove variants with large fequency deviation from reference, however, that leaves lot of incorrect snps in the middle, or, if i exclude all from the middle region it leaves a big empty range of frequency (0.4-0.6).
This worked!
Here is a full code to normalise plink genotypes:
If PLINK genotypes are in hg19 build, liftOver it to hg38:
I used tools automatically through conda (plink, plink2, samtools, ucsc-liftover). Thanks to Pierre Lindenbaum and Freeseek apol1 blogpost http://apol1.blogspot.com/2016/10/1000-genomes-project-phase-3-principal.html