is it .impute2 format equal to .gen format
1
2
Entering edit mode
7.5 years ago
Peter Chung ▴ 210

I would like to apply qctool software from .impute2 format to .bgen format. However,

qctool -g HELLO.impute2 -og HELLO.bgen

Error comes out:

!! Error (genfile::BadArgumentError): In argument(s) filetype_hint="guess", inferred type = "" to function genfile::SNPDataSource::create(): Unrecognised file type..

So I should transform the .impute2 format to .gen format then to .bgen format.

But I checked on the file that the format of impute2 is same as gen format.

So anyone know how to apply the qctool package to the .impute2 file

imputation impute2 qctool • 6.0k views
ADD COMMENT
0
Entering edit mode

IMPUTE2 should produce output in GEN format. How have you run IMPUTE?, i.e., how was HELLO.impute2 produced?

ADD REPLY
0
Entering edit mode

you could run genipe to get the imputed file with .impute2 suffix

ADD REPLY
0
Entering edit mode

Have you found the answer? I have the same issue with you.

ADD REPLY
0
Entering edit mode

Have you found the answer? I am also facing the same issue. The format of my .impute2 file is:

--- 1:10177:A:AC 10177 A AC 0.455 0.543 0.002
ADD REPLY
1
Entering edit mode

Sorry for late reply. I thought I did not use qctool package afterwards since I can not solve the problem. As the answer from @Kevin Blighe, I think I use plink to do it and some options needed like --allow-extra-chromosome something like that. Hope it helps.

ADD REPLY
2
Entering edit mode
3.6 years ago

This thread is accumulating traffic, so, I thought to provide some guidance.

From what I can gather, qctool became 'outdated' over time. If your aim is to perform basic QC after an IMPUTE2 imputation, then gauge quality via the r^2 INFO scores that are contained in the *summary files produced by IMPUTE2. A score of 1 indicates perfect imputation. Please use a search engine to search for impute2 info score for further information.

I never actually filter out any imputed variants based on INFO score - all are retained. What I do is, as I loop over all imputed 'chunks', I build a list of variants that have INFO score >= 0.9. This list is retained in the long term and can be used to filter on the final produced VCF/BCF.

In order to convert your IMPUTE2 data to other formats, I would recommend to first convert to VCF, which is the most standardised format for genetics data, and from which it should be easy to convert to any other format. For example, PLINK will easily import VCF data, and, from there, you can export again to many other formats, including the 'Oxford' GEN format (see https://www.cog-genomics.org/plink/1.9/data#recode)

IMPUTE2 output can be converted to VCF via:

mv chunk1_haps chunk1_haps.haps ;
shapeit -convert \
  --input-haps chunk1_haps.haps \
  --output-vcf chunk1.vcf ;
mv chunk1_haps.haps chunk1_haps ;

Note, that, before running the above command, you will have to add a .haps extension to your IMPUTE2 output files, as elaborated here:



I have an entire pre-phasing and imputation workflow, here (last tested in 2019/20):

  1. Phasing with SHAPEIT
  2. ERROR: You must specify a valid interval for imputation using the -int argument, -use_prephased_g: command not found, in IMPUTE2

Yes, I also became tired of all of these messy threads on imputation tools.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you for your response. I have gone through all the links you provided all and above and I did not find any solution to my issue.

I ran an imputation similar to your code in the second link(2). I used -phase option for the imputation.

My issue with converting IMPUTE2 output to VCF file with SHAPEIT is associated with the fact that I do not have any files as [input.haps].

The imputation output files are: data.impute2, data.impute_haps, data.impute_info and data.impute_info_by_sample, .summary and .warning. and all these files are saved in a separate impute output folder for each chromosome. Additionally, shapeit asks for [impute.sample] file, which is not produced by IMPUTE2 from the imputation but from the pre-phasing with shapeit (one step before imputation).

I was following the qctool approach mentioned here How to convert IMPUTE2 to VCF format but the qctool didn't accept the data.impute2 format.

--- 1:10177:A:AC 10177 A AC 0.455 0.543 0.002

I do appreciate it if you could let me know if I am missing anything in converting the IMPUTE2 output file to VCF using shapeit.

ADD REPLY
0
Entering edit mode

Hi, yes, I feel your frustration. This qctool, I have found, just does not work. I checked my code from 2019 and I had commented out the part where I was attempting to run this after having also failed [to get it to run].

It is strange that you have no haps output. Is data.impute_haps not it? It should be a reasonably large file. If you run the following, it should work:

shapeit \
    -convert \
    --input-haps data.impute_haps \
    --output-vcf data.impute.vcf ;
ADD REPLY
0
Entering edit mode

Thanks I tried the data.impute_haps as an impute too, but apparently the program takes the prefix [data.impute_haps] and adds the suffix [.haps] to it. Therefore, the file data.impute_haps.haps should be already among the output files to be used. Unless I am doing something wrong!

ADD REPLY
1
Entering edit mode

Maybe this one can help as well. GTOOL can be used to convert datasets stored in GEN file format into PED files. https://www.well.ox.ac.uk/~cfreeman/software/gwas/gtool.html See if this works

ADD REPLY
0
Entering edit mode

I think that you do have to rename it and add the .haps extension. I have made this more clear in my answer

ADD REPLY
2
Entering edit mode

Thank you for clarification. The confusion for me was coming from the fact that the software should produce the files with [.haps] automatically. Now it makes more sense. Cheers

ADD REPLY

Login before adding your answer.

Traffic: 1581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6