Question

Genotype imputation - do SNP IDs of input files and reference panels matter?

0

Entering edit mode

3.4 years ago

Volka ▴ 180

Hi all,

I am currently working on performing genotype imputation using a mixture of reference panels and input files, on Minimac4. In some of my input VCFs, I had reannotated the SNP IDs, so that the IDs follow the format of CHR:POS:REF:ALT. In my other input files, it is a mix of RSIDs and CHR:POS formats as the SNP ID. The files are completely the same other than the SNP IDs.

I have already run imputation on for example, the files from Run A where SNP IDs are in the format CHR:POS:REF:ALT, and Run B where the SNP IDs are a mix. My question is, would I have to go back and standardize the SNP ID formats before imputation, or would the result be the same regardless? Does Minimac4 carry out imputation based on the SNP position only?

genotype SNP reference VCF imputation • 878 views

ADD COMMENT • link updated 3.4 years ago by 4galaxy77 2.9k • written 3.4 years ago by Volka ▴ 180

0

Entering edit mode

My guess is that the IDs aren't important and the imputation is based upon the POS column (of course scaled by genetic distance).

ADD REPLY • link 3.4 years ago by 4galaxy77 2.9k

score 2 · Accepted Answer · 2021-07-05

it very likely doesnt need that information, but only ways to be sure would be look at the code, or try it. It generates the imputation estimates based on linkage and yes location, although the base position is not really as important as having them in the correct order ... should only be a very short script to replace the old names with new if you have both files handy.

Hope that helps.