Hi,
I am currently trying to format some of my bim files I have to run imputation on the sanger imputation server. For this purpose I need to convert all my SNP to forward strand and to be matched with ref variant on grch37.
I used the tool by the Mccarthy group online (HRC Imputation Preparation and Checking) which gave me a list of plink commands to run on my original files to get them in the right format. I did that and I got the updated versions of the bfiles as I needed. I looked at the files to make sure that things were changed as they should have been and they were. However, when I make the change from the bfile to the vcf for some reason my ref and alt alleles keep getting switched around and I have no clue why/how to make it stay the way I want it.
For instance, when I run the imputation on my new vcf I get this error:
Reference allele mismatch at 1:1118275 .. REF_SEQ:'C' vs VCF:'T'
When I check my updated bfiles, I have the following:
1 rs61733845 0 1118275 C T
Furthermore, when I check which plink commands the Mccarthy tool suggested me to use on my original bfiles to do the ref/alt allele switching this (61733845) was one of the rsID's that it said needed to be flipped which indicates to me that the bfiles are fine but I am getting screwed over on the conversion to the vcf.
Even which I have tried to use plinks command to force the alleles to be as given by a reference text file when I am doing the recoding (like the one that is outputting by Mccarthys tools) with the commands (tried all of these(
--keep-allele-order
--real-ref-alleles
--a1-allele [filename] {A1 allele col. number} {variant ID col.} {skip}
(aliases: --reference-allele, --update-ref-allele)
--a2-allele [filename] {A2 allele col. number} {variant ID col.} {skip
I am still having no luck and I am still getting this reference allele mismatch at this position.
Any help would be greatly appreciated!!
Hi, I got the same problem. That is because when you were converting plink to vcf, probably using plink --recode vcf, "The A2 allele is saved as the reference and normally flagged as not based on a real reference genome ('PR' INFO field value)." See plink explanation here. That's saying the --recode vcf command switched the alleles in your updated bim file, where A1 is set as ref allele.
The solution is:
plink --bfile your_data --recode vcf --a2-allele Force-Allele1-DATA.QC-1000G.txt
Force-Allele1-DATA.QC-1000G.txt is the file that saves RsID and ref_allele (which can be generated by HRC checking tool)