Hello everyone,
Still I got no solution of how to convert the genotype data to plink format. The raw data is something like this:
rs# alleles chrom pos strand assembly# center protLSID assayLSID panelLSID QCcode NA06984 NA06985 NA06986 NA06989 NA06991 NA06993 NA06994 NA06995 NA06997 NA07000 NA07014 NA07019 NA07022 NA07029 NA07031 NA07034 NA07037 NA07045 NA07048 NA07051 NA07055 NA07056 NA07345 NA07346 NA07347 NA07348 NA07349 NA07357 NA07435 NA10830 NA10831 NA10835 NA10836 NA10837 NA10838 NA10839 NA10840 NA10843 NA10845 NA10846 NA10847 NA10850 NA10851 NA10852 NA10853 NA10854 NA10855 NA10856 NA10857 NA10859 NA10860 NA10861 NA10863 NA10864 NA10865 NA11829 NA11830 NA11831 NA11832 NA11839 NA11840 NA11843 NA11881 NA11882 NA11891 NA11892 NA11893 NA11894 NA11917 NA11918 NA11919 NA11920 NA11930 NA11931 NA11992 NA11993 NA11994 NA11995 NA12003 NA12004 NA12005 NA12006 NA12043 NA12044 NA12045 NA12056 NA12057 NA12144 NA12145 NA12146 NA12154 NA12155 NA12156 NA12234 NA12236 NA12239 NA12248 NA12249 NA12264 NA12272 NA12273 NA12275 NA12282 NA12283 NA12286 NA12287 NA12335 NA12336 NA12340 NA12341 NA12342 NA12343 NA12344 NA12347 NA12348 NA12375 NA12376 NA12383 NA12386 NA12399 NA12400 NA12413 NA12489 NA12546 NA12707 NA12708 NA12716 NA12717 NA12718 NA12739 NA12740 NA12748 NA12749 NA12750 NA12751 NA12752 NA12753 NA12760 NA12761 NA12762 NA12763 NA12766 NA12767 NA12775 NA12776 NA12777 NA12778 NA12801 NA12802 NA12812 NA12813 NA12814 NA12815 NA12817 NA12818 NA12827 NA12828 NA12829 NA12830 NA12832 NA12842 NA12843 NA12864 NA12865 NA12872 NA12873 NA12874 NA12875 NA12877 NA12878 NA12889 NA12890 NA12891 NA12892
rs10399749 C/T chr1 45162 + ncbi_b36 perlegen urn:lsid:perlegen.hapmap.org:Protocol:Genotyping_1.0.0:2 urn:lsid:perlegen.hapmap.org:Assay:25761.5318498:1 urn:lsid:dcc.hapmap.org:Panel:CEPH-30-trios:1 QC+ NN CC NN NN NN CC CC NN NN CC NN CC CC CC NN CC NN NN NN NN CC CC CC NN NN CC NN CC NN CC CC CC NN NN CC CC NN NN NN CC CC NN CC NN NN CC CC CC CC CC CC CC CC NN NN CC CC CC CC CC CC NN CC CC NN NN NN NN NN NN NN NN NN NN CC CC CC NN CC CC CC CC CC CC NN CC CC CC CC CC CC CC CC CC CC CC CC CC CC NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN NN CC NN CC CC NN NN CC NN NN CC CC CC CC CC CC CC NN NN NN NN NN NN NN CC CC CC CC CC CC NN NN NN NN NN NN NN NN NN NN CC CC CC CC CC NN CC NN NN NN CC
So is there anyone has the script for this kind of conversion and a batch command?
Thank you
Best
I have no familiarity with plink formats. Could you provide an example file/snippet or link to explanation of the format? Then I bet someone here can show you how to write a convertor. They seem to be discussing the same issue here. Does that give you any pointers?
Also, there are a large number of posts at biostar on how to convert to/from plink format. Including vcf to plink, IMPUTE2 to plink, raw GWAS to plink, Illumina raw genotype to plink, .pre files to plink, hapmap to plink with PEAS, SNPTEST format to plink, BED to plink, and Affymetrix in hapmap format to plink. And, that is just on first two of four pages of search results.
Yes, I have searched the biostar and found those link you referred, while none could match my question. But finally I solve the problem using GLU & perl script.
Thank you anyway.
Glad you figured it out. Why not post your solution as an answer to your question.