Is there a way to convert from ped PLINK format into HapMap genotype format? I've got PED PLINK files that I want to analyse with a program that requires HapMap genotype format (the examples below are from HapMap, so I am guessing it's doable to convert them):
From this example file ([...]
means more of the same):
wget -qO- ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/plink_format/hapmap3_r2_b36_fwd.consensus.qc.poly.ped.bz2 | bunzip2 -c | head -n 1 | less
2427 NA19919 NA19908 NA19909 1 -9 C C C C C C A A G G G G C C T T G G [...] A A
To this file format ([...]
means more of the same):
wget -qO- ftp://ftp.ncbi.nlm.nih.gov/hapmap/genotypes/2009-01_phaseIII/hapmap_format/consensus/genotypes_chr10_ASW_phase3.2_consensus.b36_fwd.txt.gz | gunzip -c | head
center protLSID assayLSID panelLSID QCcode NA19625 NA19700 NA19701 NA19702 NA19703 NA19704 [...] NA20364
rs12255619 A/C chr10 88481 + ncbi_b36 bbs urn:lsid:bbs.hapmap.org:Protocol:Phase3_Draft1:1 urn:lsid:bbs.hapmap.org:Assay:Phase3_Draft1_rs12255619:1 urn:lsid:dcc.hapmap.org:Panel:US_African-30-trios:3 QC+ AA AA AC AA AA AA [...] AA
For more details on the HapMap format:
-HapMap file format:
The current release consists of text-table files only, with the following columns:
Col1: refSNP rs# identifier at the time of release (NB: it might merge
with another rs# in the future)
Col2: SNP alleles according to dbSNP
Col3: chromosome that SNP maps to
Col4: chromosome position of SNP, in basepairs on reference sequence
Col5: strand of reference sequence that SNP maps to
Col6: version of reference sequence assembly (currently NCBI build36)
Col7: HapMap genotyping center that produced the genotypes
Col8: LSID for HapMap protocol used for genotyping
Col9: LSID for HapMap assay used for genotyping
Col10: LSID for panel of individuals genotyped
Col11: QC-code, currently 'QC+' for all entries (for future use)
Col12 and on: observed genotypes of samples, one per column, sample
identifiers in column headers (Coriell catalog numbers, example:
NA10847).
Hi Pierre,
Have you ever try paraHaplo software? I would like to use it because of its capabality to multiprocessing with MPI. I have used your script converttpedto_hapmap.pl to convert my tped data file to hapmap format but I meet a problem.
When I'm trying to run the haplotypePhasing module with command "../../src/SNP/bin/haplotypePhasing.exe ../../data/test.tped.hapmap hblockdef.txt phasedout_test.txt 1 9331"
I have got such error "input file open error! FileName:phasedouttest.txt_output0"
Unfortunately I dont know what could be a real problem. Maybe it is paraHaplo problem or hapMap format generated by your scritp. So, I wondering if you have any experience with paraHaplo?
The files could also come from a vcf source:
~/src/tabix-0.2.6/bgzip -c test.100000.vcf > test.100000.vcf.gz ~/src/vcftools_0.1.8a/bin/vcftools --gzvcf test.100000.vcf.gz --plink-tped
Then converting the tped files into hapmap could be done with the answers in this biostar question.