Hi All,
I have an illumina genotype final report, and I want to convert it to .ped using plink. I prepared three files; fam, lgen, and fam (please find below a snap shot for these files ) Then I used plink, but O got the error Variant 'rs1000002' in .lgen file has 3+ different alleles. Although, as you can see from the first line of lgen file, Variant 'rs1000002' has only two alleles
$ plink -lfile 200638550003_R01C01 --no-parents --no-sex --no-pheno --recode
PLINK v1.90b3.32 64-bit (24 Feb 2016) https://www.cog-genomics.org/plink2 (C) 2005-2016 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink.log. Options in effect: --lfile 200638550003_R01C01 --no-parents --no-pheno --no-sex --recode
128715 MB RAM detected; reserving 64357 MB for main workspace. Processing .lgen file... 0%Error: Variant 'rs1000002' in .lgen file has 3+ different alleles.
lgen_file:
200638550003_R01C01 NA1 rs1000002 A G
200638550003_R01C01 NA2 rs1000003 A G
200638550003_R01C01 NA3 rs10000030 A G
fam_file: 200638550003_R01C01 NA1 0 0 0 0
200638550003_R01C01 NA2 0 0 0 0
200638550003_R01C01 NA3 0 0 0 0
map_file: 3 rs1000002 0 183635768
3 rs1000003 0 98342907
4 rs10000030 0 103374154
can you send me the link for this newer build. Also, can I do this with plink2? I searched plink2 but I could not --file flag Thanks
This is downloadable from https://www.cog-genomics.org/plink/1.9/ .
(As for why plink 2.0 doesn't have the --file flag yet, that's lower priority because plink 1.9 already has a good implementation. The first development priority is enabling important things that can't be done at all with plink 1.9.)
Hi Christopher, How can I create a lgen file which has only one individual genotyped for 300,000 SNPs. I mean in the context of sample ids, what shall I use? I created the lgen, fam files below. Then I used plink 1.9 to generate ped file (log file shown belos). The problem is that genotypes in ped file mismatch genotype in lgen file.
lgen_file: 200638550003_R01C01 200638550003_R01C01 rs1000002 A G
200638550003_R01C01 200638550003_R01C01 rs1000003 A G
200638550003_R01C01 200638550003_R01C01 rs10000030 A G
200638550003_R01C01 200638550003_R01C01 rs10000037 G G
fam_file: 200638550003_R01C01 200638550003_R01C01 0 0 0 0
map_file
3 rs1000002 0 183635768
3 rs1000003 0 98342907
4 rs10000030 0 103374154
4 rs10000037 0 38924330
ped_file: 200638550003_R01C01 200638550003_R01C01 0 0 0 -9 A A G A G A G A G G G A G A A A A A G G
log_file:
PLINK v1.90b4.6 64-bit (15 Aug 2017) Options in effect: --lfile test_1 --no-fid --no-parents --no-pheno --no-sex --out test_1_1.9 --recode
Hostname: quser11 Working directory: /projects/b1042/BurridgeLab Start time: Wed Sep 13 15:23:58 2017
Random number seed: 1505334238
128715 MB RAM detected; reserving 64357 MB for main workspace.
Processing .lgen file... done.
--lfile: test_1_1.9-temporary.bed + test_1_1.9-temporary.bim +
test_1_1.9-temporary.fam written.
10 variants loaded from .bim file.
1 person (0 males, 0 females, 1 ambiguous) loaded from .fam.
Ambiguous sex ID written to test_1_1.9.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 0 nonfounders present.
Calculating allele frequencies... done.
10 variants and 1 person pass filters and QC.
Note: No phenotypes present.
--recode ped to test_1_1.9.ped + test_1_1.9.map ... done.
Thanks Tarek
Well, it doesn't look like you posted the entire .lgen or .map files (only 4 variants, when the .log clearly indicates there were 10), so there's no way to tell if they match.
Note that the .map file you did post has out-of-order variants, and plink automatically sorts them by position before generating the new .ped + .map fileset.