Dear All,
have taken a PED/MAP format PLINK file and converted it into a .gen/.sample file with gtool. This has given me this look:
pkd@bioinform:~/strand_correct_script/Files_during_updating$ head controls.gen | cut -d " " -f 1-20
5 chr5:96000607 96000607 A G 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0
5 rs1421911 96000947 C T 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0
5 rs6860934 96001842 C T 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1
I understand from the IMPUTE website the first column should be SNP1,SNP2,SNP3, but I pushed on thinking maybe things would sort themselves out. I imputed with IMPUTE2 against 1000 genomes and then this produced this format of .gen file:
pkd@bioinform:~/Impute2/converting_back_to_plink$ head European_imputed_controls.gen | cut -d " " -f 1-20
--- 5-96000097 96000097 A G 1 0 0 1 0 0 0.976 0.024 0 1 0 0 1 0 0
--- 5-96000203 96000203 C T 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
--- 5-96000264 96000264 C T 1 0 0 1 0 0 0.998 0.002 0 1 0 0 1 0 0
--- rs7733671 96000269 G A 0 1 0 0 1 0 0 0.947 0.052 1 0 0 0 0 1
--- 5-96000338 96000338 C A 1 0 0 1 0 0 0.997 0.003 0 1 0 0 1 0 0
--- rs73774358 96000463 A G 1 0 0 1 0 0 0.985 0.015 0 1 0 0 1 0 0
--- 5-96000525 96000525 G A 1 0 0 1 0 0 0.997 0.003 0 1 0 0 1 0 0
5 chr5:96000607 96000607 A G 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0
--- rs73774359 96000658 A C 1 0 0 1 0 0 0.985 0.015 0 1 0 0 1 0 0
When I tried to convert it back to PLINK PED/MAP Plink said that all the SNPs were named the same "---" and crashed in flames. I have read the gtool site and cannot see any reference to what it puts in the first column when it converts from PLINK to GEN/SAMPLE, or what IMPUTE2 should do when it imputed new snps. I can load the file into R and put an arbitrary first column in, but I was wondering whether this is necessary or have I made an error somewhere.
Thank you in advance.
Philip
he did use gtool, problem is that SNPs get called "---" in the map if there is no rsID#. My script above will convert these SNPs to chr:pos format to give them a uniq ID and get you through PLINK.