Hello again
I ´have to convert Illumina HumanHap chip data into PLINK (PED file). I'll proceed as described here. But my generated ped file shows only 0 for each genotype. Plink is warning during the process:
[...] 50 males, 50 females, and 0 of unspecified sex
Before frequency and genotyping pruning, there are 1000000 SNPs
100 founders and 0 non-founders found
1000000 SNPs with no founder genotypes observed
Warning, MAF set to 0 for these SNPs (see --nonfounders)
Writing list of these SNPs to [ plink.nof ]
Total genotyping rate in remaining individuals is 0 [...]
fam-file:
1 192 0 0 1 0
2 193 0 0 2 0
3 213 0 0 1 0
4 214 0 0 1 0
map-file:
1 rs3934834 0 995669
1 rs3737728 0 1011278
1 rs6687776 0 1020428
1 rs9651273 0 1021403
lgen-file:
[Header]
BSGT Version 3.0.27
Processing Date
Content
Num SNPs 1000000
Total SNPs 1000000
Num Samples 100
Total Samples 100
[Data]
Sample Index Sample Name SNP Name Allele1 Allele2
1 192 rs10000010 A G
2 193 rs10000010 A G
3 213 rs10000010 A G
My lgen file has a 10 row header then the data-rows are following. The information about the genotype is given by the forward alleles exportet via beadstudio (With Top Alleles the same sobering result)
After running plink to reconstruct ped file I get this ped file with missing genotypes:
1 192 0 0 1 -9 0 0 0 0 0 0 0 0 [...]
2 193 0 0 2 -9 0 0 0 0 0 0 0 0 [...]
3 213 0 0 1 -9 0 0 0 0 0 0 0 0 [...]
Perhaps one of you, find the mistake or have an idea to solve the problem. Do I need a reference file or is the title in the lgen-file the problem? Thank you very much.
Perfect. Thank you. PLINK starts to work now but there is a new error. In my file there are too many Allels.
I've edited my answer to address the issue.
nice. I had the same idea. I use windows therefore do you have a script for plink or perl?
If you can manage to open the file in a text editor and perform "find and replace" on all "-" to "0", I think that should work, otherwise, if you are going to do much bioinformatics work in Windows I would suggest installing and becoming familiar with Cygwin.
Unfortunatly it's to big. I can't open it in notepad.
As a slightly less intimidating alternative to installing Cygwin for sed functionality, you can probably use this blog post about Powershell.
Hehe, thanks I check this. Now I get it with perl.
Hi. I'm facing a similar issue to the above. I have made a .ped file from a beadstudio report but my missing values are specified as "-" rather than 0. The file is too big to find and replace using Nano and the above perl command replaces only the first occurence (in this case changing the phenotype specification "-9" to "09"). I'm not familiar with perl or command line operations and wondered if anyone could help?
I've managed to get around the phenotype issue by using:
But this is still only dealing with the first occurences in the ped file.
Fixes this for anyone encountering a similar problem.