Question

Run Time Of Imputation Using 1000 Genomes

3

Entering edit mode

13.1 years ago

Psb ▴ 30

I am imputing on few candidate regions of 1.5Mb using Mach and 1000 genome phase 1 as reference dataset. I am following a two step approach for imputation. The run time for my first step is 3 hours, the command line for which was:

mach1 –p chr1.ped –d chr1.dat –h chr1ref.hap –s chr1ref.snp --greedy --rounds 50 --states 200 --compact --autoFlip --prefix step1_chr1

For second step of imputation, the command line was:

mach1 –p chr1.ped –d chr1.dat –h chr1ref.hap -s chr1ref.snps --errormap step1chr1.erate --cross step1chr1.rec --greedy --mle --mldetails --compact --autoFlip --mask 0.02 --prefix step2_chr1

It has been more than 48 hours and the programme is still running.According to the information available on Mach the second step is comparatively faster than the first step. Should it take this long? Can anyone tell me what went wrong??

imputation genome • 4.9k views

ADD COMMENT • link updated 13.1 years ago by Genotepes ▴ 950 • written 13.1 years ago by Psb ▴ 30

0

Entering edit mode

Just a quick question: what's the file format for the "chr1ref.hap" file in "–h chr1ref.hap"?

Thanks!

ADD REPLY • link 7.0 years ago by moxu ▴ 510

score 3 · Answer 1 · 2012-04-01

1000g uses four imputation algorithms: IMPUTE2, beagle, mach and snptools. The official released is produced by the UMich group using beagle followed by mach. They do not use mach only, because beagle is faster. As I remember, the whole process took about a month on a decent cluster. This is already slow. There are several on phasing/imputing genotyping data. The consensus is almost always: beagle is much faster than mach, but less accurate. So if you do everything with mach, it will be even slower.

score 2 · Answer 2 · 2012-03-30

2

Entering edit mode

13.1 years ago

Zev.Kronenberg 12k

You should try BEAGLE it is very fast!

@Khader Shameer: just for you :-)

Browning & Browning 2011

ADD COMMENT • link 13.1 years ago by Zev.Kronenberg 12k

0

Entering edit mode

It will be nice if you could add a link to the software and paper that describe BEAGLE with benchmarking details to make the answer more informative.

ADD REPLY • link 13.1 years ago by Khader Shameer 18k

score 1 · Answer 3 · 2012-03-30

True tha BEAGLE is fast although it has some quite large memory requirments.

An alternative is to use SHAPEIT for prephasing - actually it is just phasing your own data and the "pre" comes from the fact that you are doing this before imputation.

Splitting the Imputation into a (pre) phasing and a pure imputation steps is very convenient.

Everything is explained here.

I'd advise to run IMPUTE without prephasing on your best hits .doesn't take much time.

Here is the URL with SHAPEIT and all explanations.

The accuracy of this 2-stage process has been evaluated, but not thoroughly (quite recent). There is a loss of information but it seems to be negligible - especially as the time gained is huge.

Best

http://www.shapeit.fr/