The Sanger Imputation Service does not offer a pipeline that utilizes the family information during phasing. Therefore, our pipeline (previously) has been to previously use genetic maps from the 1000 Genomes (Phase 3) to phase the data externally using SHAPEIT software with family information. After this we upload VCFs to the Sanger Imputation Service and Choose "1000 Genomes Phase 3" as the reference panel and "impute with PBWT, no pre-phasing":
# Download the genetic map for 1000 Genomes
wget https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.tgz
tar -xf 1000GP_Phase3.tgz --wildcards "1000GP_Phase3/genetic_map_*"
wget https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3_chrX.tgz
tar -xf 1000GP_Phase3_chrX.tgz --wildcards "genetic_map_*"
# phase (with duohmm for autosomes)
for i in {1..22}; do shapeit -T 8 -B 15Nov2018/CHR$i -M 1000GP_Phase3/genetic_map_chr"$i"_combined_b37.txt -W 5 -O FGDS_chr$i --states 200 --duohmm ; done
However, I would now like to find genetic maps for the Haplotype Reference Consortium (HRC) as it includes more than 32 thousand samples, from 20 different cohorts (including the UK10K and the 1000 Genomes Project Phase 3). The 1000G only has 2,504 samples, therefore the HRC should provide more accurate imputation at lower frequencies, especially in European cohorts, since it has a larger set of samples (perhaps not for Asian/African ancestry though).
Does anyone know where I can find this information for HRC?
Did you get an answer to your question? I'm also looking for a larger genetic map so I can convert genome coordinates to cM for as many variants as possible.
Hi, the data may be found here: https://mathgen.stats.ox.ac.uk/impute/1000GP_Phase3.html
See my threads for pre-phasing and imputation, here: