Entering edit mode
7.2 years ago
GabrielMontenegro
▴
670
I am trying to merge two different PLINK files genotyped on the same platform. However, I found some inconsistencies that could not be solved by the --flip
command. I checked the SNPs that could not be merged after the flip and found this type of problem:
File 1:
4 rs10000432 68.93 47511781 T C
File 2:
4 rs10000432 68.93 47511781 A C
One of the alleles is different, while the other allele is the same on both files.
What should I do on this case? I wasn't expecting to find these type of issues on these datasets since both were genotyped on the same platform.
Just an initial comment: for that particular SNP, the ancestral 'reference' allele is C (https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=10000432).
When you attempted to merge the first time, PLINK may have output a file with the extension *.missnp, which would contain these multi-allelic sites. You can remove these from your dataset with the following command:
Then attempt to merge again.
If you don't want to necessarily remove these, then you may have to do more rigorous data preparation. In which format was your data initially - VCF?? It would be useful to run every genotype against a reference genome and ensure that the ref>alt order is maintained.