I am currently working on rare variants association studies from whole genome seq data.
I want to replicate my results in an array-genotyped cohort where rare variants have been previously imputed with Michigan imputation server based on Haplotype Reference Consortium panel.
However, I found out that my top hits have been badly imputed and therefore I would like to re-impute rare variants based on a custom reference panel built on my own wgs data.
I see that IMPUTE2 and SHAPEIT are widely suggested to this purpose.
However I can't find any clear explanation about how to generate a reference panel.
I have individuals vcf files and plink files and also a merged plink ped file including all the samples.
Could anybody kindly suggest me any tutorial/resource where I can learn how to do it?
Is there another better strategy to impute those rare variants missed by Michigan server rather than generating my own reference panel?
I am neither yet to see a good tutorial for this, and the documentation is never great for these programs.
As you already have your data in PLINK format, you can export that straight to GEN format for IMPUTE2 with the --recode oxford command-line parameter. You should, then, be able to use this straight away as a reference panel in IMPUTE2.
Edit December 11, 2019: if you are planning to do pre-phhasing into 1000 Genomes haplotypes using SHAPEIT, then SHAPEIT can read direct from PLINK format
Conversely, starting from the VCF stage, I would merge your samples into a single VCF and then use this script to convert VCF format to GEN.
I would also consider creating a merged reference panel that consists of your data plus 1000 Genomes. There have been posts on this in the past but it's not a frequent topic. Here, I am trying to link all of them:
Thanks a lot Kevin for your suggestions,
I managed to create my own reference panel with --recode oxford option in PLINK.
However, as you said, given my relatively small sample size, merging my reference panel with 1000 Genomes data could be a better choice.
Thanks a lot Kevin for your suggestions, I managed to create my own reference panel with --recode oxford option in PLINK. However, as you said, given my relatively small sample size, merging my reference panel with 1000 Genomes data could be a better choice.