It’s a little bit convoluted.
I just want to impute SNPs in 23andme format like the following:
# rsid chromosome position genotype
rs3094315 1 752566 AA
rs12562034 1 768448 AA
rs3934834 1 1005806 CC
rs9442372 1 1018704 GG
rs3737728 1 1021415 GG
rs11260588 1 1021658 GG
rs6687776 1 1030565 CT
I was told Minimac3 is the best tool to impute for 1 sample at a time (I am not looking to impute multiple samples at one time, but 1 sample by 1 sample for some reason). Minimac3 is easy to use and fast, and I made it work. However, it requires phased input file, so I need to phase the file described above.
Eagle from Broad Institute was recommended to do phasing, and it seems that eagle only takes a genomic profile in .VCF format, so I converted the above file into .VCF as the following:
##fileformat=VCFv4.2
##filedate=Fri Aug 26 23:11:37 EDT 2016
##source=csv2vcf.pl
##reference=
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GENOTYPE
1 752566 rs3094315 G A . . . GT 1/1
1 768448 rs12562034 G A . . . GT 1/1
1 1005806 rs3934834 C T . . . GT 0/0
1 1018704 rs9442372 A G . . . GT 1/1
1 1021415 rs3737728 A G . . . GT 1/1
1 1021658 rs11260588 G A . . . GT 0/0
and named it “myprofile.vcf”. Then I ran eagle using the following:
eagle --vcf myprofile.vcf --geneticMapFile Eagle/tables/genetic_map_hg19_withX.txt.gz --outPrefix /tmp/myprofile.beagleImputed
“Eagle/tables/genetic_map_hg19_withX.txt.gz” was provided by eagle.
It didn’t go through. The error I got was:
[W::vcf_parse] contig '1' is not defined in the header. (Quick workaround: index the file with tabix.)
ERROR: Multi-allelic site found (i.e., ALT contains multiple alleles)
Either drop or split (bcftools norm -m) multi-allelic variants
Not sure what it exactly means. “index the file with tabix”, index which file? It cannot be the “genetic_map_hg19_withX.txt.gz” file right? So I tried to “tabix myprofile.vcf”, then I got the following error:
Not a BGZF file: data/genome_3j.vcf
tbx_index_build failed: data/genome_3j.vcf
Up to this point, I think the error is reckless. Probably I’ve done something terribly wrong.
Can someone please help? Either with eagle/fabix or someother workaround.
I just want to impute some SNPs in this very popular and easy format, can’t someone write a program to just take such a file as input and a couple of options to point to needed reference SNP database and/or genome sequences? Actually, someone has already done that – Michigan Imputation Server, but you need to register an account, upload your data to their server, and download results there. This is awesome and the way to go in terms of the simplicity to use, but you cannot pipeline in the server.
The “manuals” or “READMEs” or “instructions” are not good enough for me.
Thanks for any instructions.
Great! After doing the bgzip on the .vcf, it went much further. I ran
and got the following error:
Not sure why eagle requires this much -- I guess -- memory.
It didn't complain about multi alleles this time, although I have not taken care of the multi allele problem yet because I don't know how. Could you please give me the bcftools command line to run the splitting? I don't see a split command/option or something that might be related.
Thanks much!
This may have been answered, but may I please check if there's a solution to the memory issue? I am having the same problem whilst trying to run Eagle2. Thanks so much.
Dear I am also facing the exact same error. May I know how you fixed this issue.
Did anyone figure out a solution to this error? ERROR: Failed to allocate 18446744073709551596 bytes
I keep getting it and I can't find an answer anywhere online.
I have the same problem, did anybody find a solution?
sorry, I found the problem!