Entering edit mode
6.6 years ago
bha
▴
80
I want to convert Haplotypes data to PLINK format (.map and .ped or binary .fam,.bim, .bed). I wonder, what is best software or R package, can do this easily? Has anybody came across this? Haplotype data is output file from HAPGEN2 (a programme which simulate the sequence data, but unfortunately, there is not a function in this to re-convert back to PLINK).
is this the output file format: http://www.stats.ox.ac.uk/~marchini/software/gwas/file_format.html ?
yes! do you have any sense to convert this to PLINK format?
could you post a small subset of your file somewhere?
The genotype file is exactly in the same format you mentioned in above link, and haplotype is 0s and 1s in standard file. The genytpe look like:
So, at SNP3 the two alleles are C and T so the set of 3 probabilities for each indvidual correspond to the genotypes CC, CT and TT respectively.
Note : columns 2 and 3 (that contain the RS ID and base-pair position of the SNPs are set arbitrarily in this example.
and you have a sample file as well? are these files simply zipped and not binary?
Yes, i do have sample file as well. it's NOT zipped. here are the out files look like: http://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html#top
and I imagine these probabilities are not just 0 and 1 but can be 0.4 for example? But they have to sum up to 1 I suppose.
these probabilities are 0s and 1s. And haplotypes are also in 0s and 1s. what do you suggest?
but they could potentially be: 0.5 0.5 0? just to be sure.
Yes, essentially they are. I have both genotypes, and haplotypes files. My main concern is to convert them to PLINK format, either haplotype or genotypes. Any idea please?
I can code a module to import them using glactools and export in plink. We currently do not support this format but I could code it.
Seems like that GTOOL can do this conversion: http://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#formats
did you try it? did it work?
yes, i think it works well.