dear All
I make ped and map file on shell, when I read file by plink I get below problem;
./plink --file test --noweb --missing-genotype N
54241 (of 54241) markers to be included from [ test.map ]
ERROR: Locus 1 has >2 alleles: individual R921C12 273487 has genotype [ T C ] but we've already seen [ - ] and [ T ]
I cheak my file, I seems ok, the data is indeed 'CC' with no -'s or T's nearby! the length of each line (i.e. for each individual) is consistent throughout. I've tried both tab- and space-demilited files, but no difference. I dont undrestand why I get this error. this is the raw which I get that problem:
R921C12 273487 2950577 2950350 1 Resistant C C T T T T C C T T . . .
any idea?
Is this the first sample in your file or are there others? PLINK may have identified the '-' allele in another sample prior to this one. Also, are you sure that missing genotypes are encoded as 'N" for your data? PLINK normally expects '-9'.
Hi, this is 2th sample in ped file. let me I write my command for make a.ped:
1- I have this file:
SNP_Name,Chr,Coordinate,R923A04,R921B12,R921C12,R921D12,R921E12
CL635944_160.1,0,0,--,CC,CC,CC,TC,TC
CR_594.1,0,0,--,TT,TT,TT,TT,TT
CR_816.1,0,0,--,CC,TT,TT,TT,TT
2- I use these two command
3- python -c "import sys; print('\n'.join(' '.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))" < a3 > a4
R921B12 CC TT CC TC TT . . .
R921C12 CC TT TT CC TT . . .
4- perl -ne '($id, $tmp) = split( / /, $_, 2 ); $tmp =~ s/ //g; print "$id "; print join(" ", split( //, $tmp ) );' a4 >a5
R921B12 C C T T C C T C T T . . .
R921C12 C C T T T T C C T T . . .
5- join <(sed -e 's/\t/ /g' 6col_ped | sort -k 1) <(sort -k 1 a5) > a6
R921B12 273504 2910033 2910215 1 Resistant C C T T C C T C T T . . .
R921C12 273487 2950577 2950350 1 Resistant C C T T T T C C T T . . .
this is my ped file that I get error . missing data in ped file is '-'. may be you right and I coudent seperate raw from each other. I write what I did? may be I make mistake ?
I was able to input your data like this:
However, something does not make sense with your data. For any PLINK data input, you should have a .map and .ped file.
I do not know what is your input data, but it has 3 columns named:
That is enough to create the MAP file.
It then has:
These must be sample IDs. In your original data (a.tsv), these columns represent the sample genotypes. In the plink input file, a6, your samples should be represented on rows.
Let me know if any of this helps.
Kevin
Hi I cheak every thing that may be related to this problem, but unfurtunatly its dosent work. the first column (sample IDs) is Familly IID in ped file so I think its not related. just may be I make wrong command in this step (python -c "import sys; print('\n'.join(' '.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))" < a3 > a4) that cused two raw wasnot seperated.