Recode SNPs dataset to Number
1
0
Entering edit mode
7.2 years ago

Dear All,

How do I recode SNPs dataset based on major allele, minor allele and heterozygote while W, N, Y nucleotide code were within the data? Usually, SNPs codes like AA, AG, AA, AA, AT, AA, GG, GG, AA, GG, AA, AA will be recoded to 0, 1, 0, 0, 1, 0, 2, 2, 0, 2, 0, 0 because A is major allele while G is minor allele. What about if I have SNPs codes like T, T, T, W, N, A, T, T, W? Previously, I used recodeSNPs function from Scrime package in R to do it. Unfortunately, it does not work for this data

SNP major allele minor allele recode SNPs • 1.9k views
ADD COMMENT
0
Entering edit mode
7.2 years ago
pfs ▴ 280

If I am understanding the question correctly you should be able to use 'sed' to do what you want. Below is untested but should work.

sed 's/W/0/g' file.txt | sed 's/N/1/g' file.txt | sed 's/Y/2/g' file.txt > new_file.txt

ADD COMMENT
0
Entering edit mode

Thank you very much for your answer. I can not encode W to 0 or others just like that because 0 is for homozygous reference and 2 for homozygous variant and 1 for heterozygous. I'm sorry, my question is not really clear. The encoding is not based on major or minor allele but reference and variant. My problem is how to decide homozygous reference and others from genotype data which is consist of nucleotide symbols not only atgc. The symbols in my data are IUPAC symbols for nucleotide of course.

ADD REPLY

Login before adding your answer.

Traffic: 1589 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6