Dear All,
How do I recode SNPs dataset based on major allele, minor allele and heterozygote while W, N, Y nucleotide code were within the data? Usually, SNPs codes like AA, AG, AA, AA, AT, AA, GG, GG, AA, GG, AA, AA will be recoded to 0, 1, 0, 0, 1, 0, 2, 2, 0, 2, 0, 0 because A is major allele while G is minor allele. What about if I have SNPs codes like T, T, T, W, N, A, T, T, W? Previously, I used recodeSNPs function from Scrime package in R to do it. Unfortunately, it does not work for this data
Thank you very much for your answer. I can not encode W to 0 or others just like that because 0 is for homozygous reference and 2 for homozygous variant and 1 for heterozygous. I'm sorry, my question is not really clear. The encoding is not based on major or minor allele but reference and variant. My problem is how to decide homozygous reference and others from genotype data which is consist of nucleotide symbols not only atgc. The symbols in my data are IUPAC symbols for nucleotide of course.