I am trying to merge two imputed (I used SHAPEIT and IMPUTE2) binary file sets with PLINK's bmerge command, but this error pops up:
Error: Identical A1 and A2 alleles on line 1
I am pretty sure I've got many single-allele SNPs in my data, so I was wondering if there is a quick way to solve this problem? I checked PLINK's manual, but there seems to be no way to ignore such SNPs or correct them.
I would like to avoid - if possible - to look for a solution in SHAPEIT or IMPUTE2, because prephasing and imputation already took a very long time to run.
This might be due to an incompatibility between PLINK's Oxford import and the latest IMPUTE2 output format. Can you send me a small .gen/.sample fileset that generates this problem? (You can probably omit all lines in the .gen file past the first 3-4.)
No, I didn't have any problem with these specific SNPs, they just were the first ones in the impute2 file and I tried to do exactly what you asked me to (i.e. paste the 3-4 first lines). I guess that SNP order was changed when I converted from gen to plink format.
I have spotted the line in the impute2 file where the monomorphic SNP is:
All the lines I've pasted here are incomplete - they are actually quite longer, involving many more individuals
I have used regex with grep + awk commands to create a list of such monomorphic SNPs and then --exclude them with PLINK. This actually resolves the issue, I was just wondering if PLINK has some built-in flag to ignore this problem, which by the way pops up only when I try to use the --merge commands.
There currently isn't a built-in command, since this generally indicates a data processing error that should be fixed at the source. But do you know what caused impute2 to generate a file with identical A1 and A2 allele codes here? If it's a routine occurrence, and it only happens with monomorphic SNPs, I will modify the .gen (and .bgen, if necessary) import routines to automatically zero out one of the allele codes here.
This might be due to an incompatibility between PLINK's Oxford import and the latest IMPUTE2 output format. Can you send me a small .gen/.sample fileset that generates this problem? (You can probably omit all lines in the .gen file past the first 3-4.)