I have .gprobs, .metrics and .sample output file from IMPUTE2 and am trying to run association test using PLINK. I have uploaded first 5 lines of chromosome 12 here:
- .gprobs: https://dl.dropboxusercontent.com/u/6723585/IMPUTE2/Chr12_head_5.gprobs
- .metrics: https://dl.dropboxusercontent.com/u/6723585/IMPUTE2/Chr12_head_5.metrics
- .sample: https://dl.dropboxusercontent.com/u/6723585/IMPUTE2/Chr12_head_5.sample
I have tried to run --dosage
analysis using the .gprobs file and .sample to do the association test on a chromosome level.
But I am getting several warnings for:
- "Duplicate individual found"
and error:
- ERROR: Badly aligned columns for: SNP A1 A2
I have also tried to convert .gprobs and .sample to native ped and fam using gtools and tried the association test using PLINK but the output files also did not worked with --assoc
command. I am wondering if there any file conversion required before taking IMPUTE2 output to PLINK or Do you recommend any other tool for association testing using IMPUTE2 output ?
PS. I have tried to ask this question(s) to IMPUTE2 mailing list, but they haven't approved me even after 24 hours after confirming my email.
Thanks zx8754 for a great answer. Do you know whether the .gen format that you mentioned and .gprobs that I have are the same ?
Not sure how
.gprobs
file looks like, but added how my.gen
files look from IMPUTE2 output.This a first line from the file. I think its the same format.
Google tells me that
.gprobs
is a BEAGLE output?True. It's a native BEAGLE format. I have downloaded this dataset from dbGAP, as per phenotype description the data is imputed using IMPUTE2, but the output file extension is given as "chromosome-specific genotype probabilities files".
From your dropbox files, I created
map
andfam
files and cut thegprobs
file for 3 samples (as there were 3 samples in the.sample
file), and--dosage
did work.That's great ! Can you please add that part also to your answer ?
Answer is updated, according to data provided.
That's great ! In the meantime, I was able to run SNPTEST2 on my data-sets seamlessly - I will post a detailed reply here so that biostars with IMPUTE2 data could try both way.