Imputation from plink
0
0
Entering edit mode
5.0 years ago

I’ve got GWAS in plink format (bed, bim, fam). I need to impute some SNPs that weren’t directly genotyped. I’ve read that I need to phase, eg with shapeit, the impute, eg with impute2. I’m having trouble figuring out which genetic map to use for shapeit (the one on their guide doesn’t work). What would be really helpful is a step by step guide to go from plink to imputed snp, as this process seems quite painful. Here's what I've done:

I'm on mac, so had to set up an Ubuntu virtualbox to run shapeit. shapeit.v2.904.3.10.0-693.11.6.el7.x86_64, and the example data with tutorial works. I've got GWAS in plink format (TRACK-HD_v3_qc_imputed_v3.bed, TRACK-HD_v3_qc_imputed_v3.bim, TRACK-HD_v3_qc_imputed_v3.fam) Read the shapeit documentation, which says under 'Genetic map', to click this link to download the map for human populations (http://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html). My GWAS is in GRCh37, so I want to download the 'HapMap phase II b37' - however this link doesn't work (http://www.shapeit.fr/files/genetic_map_b37.tar.gz). I've been looking for an alternative genetic map. First off I went to HapMap (http://hapmap.ncbi.nlm.nih.gov), but that's been retired. I went to their archive (ftp://ftp.ncbi.nlm.nih.gov/hapmap/), but it's not at all clear which file to use as the map. I also read 1KG can be used as a map, so went there, but again, not clear which file to use as a map (https://www.internationalgenome.org/data).

I've tried using this as a genetic map - 'genetic_map_chr1_combined_b36.txt', but I get the following:

michael@michael-VirtualBox:~/bin/shapeit.v2.904.3.10.0-693.11.6.el7.x86_64/bin$ ./shapeit --input-bed TRACK-HD_v3_qc_imputed_v3.bed TRACK-HD_v3_qc_imputed_v3.bim TRACK-HD_v3_qc_imputed_v3.fam --input-map genetic_map_chr1_combined_b36.txt --output-max TRACK-HD_v3_qc_imputed_v3_phased.haps TRACK-HD_v3_qc_imputed_v3_phased.sample

Segmented HAPlotype Estimation & Imputation Tool
  * Authors : Olivier Delaneau, Jared O'Connell, Jean-François Zagury, Jonathan Marchini
  * Contact : send an email to the OXSTATGEN mail list https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=OXSTATGEN
  * Webpage : https://mathgen.stats.ox.ac.uk/shapeit
  * Version : v2.r904
  * Date    : 24/11/2019 14:30:04
  * LOGfile : [shapeit_24112019_14h30m04s_8d07a6e7-5f9d-45c0-8e20-706fd12a0ba6.log]

MODE -phase : PHASING GENOTYPE DATA
  * Autosome (chr1 ... chr22)
  * Window-based model (SHAPEIT v2)
  * MCMC iteration

Parameters :
  * Seed : 1574605804
  * Parallelisation: 1 threads
  * Ref allele is NOT aligned on the reference genome
  * MCMC: 35 iterations [7 B + 1 runs of 8 P + 20 M]
  * Model: 100 states per window [100 H + 0 PM + 0 R + 0 COV ] / Windows of ~2.0 Mb / Ne = 15000

Reading site list in [TRACK-HD_v3_qc_imputed_v3.bim]

ERROR: Duplicate site pos=40345847 ref=A alt=AAAC

All in all a bit fed up and going to have break from this for a while. Any help for when I come back to it later this evening would be very helpful! Thanks!

plink shapeit impute2 • 2.3k views
ADD COMMENT
0
Entering edit mode

I have recently gone through this entire process (pre-phasing and then imputation) for 2 different cohorts. What do you mean that that shapeit genetic map "doesn't work"? PS - indeed, it is painful (and should not be necessary)

ADD REPLY
0
Entering edit mode

Thanks, I've updated above

ADD REPLY
2
Entering edit mode

Thank you. Yes, some of the links are broken, unfortunately.

I obtained my genetic maps from here Download Reference Data

I then show how SHAPEIT is essentially a 3 stage process, here: C: Strand Alignment in ShapeIt -- "Reference and Main panels are not well aligned"

ADD REPLY

Login before adding your answer.

Traffic: 1616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6