I am busy extracting SNP data from various datasets that have been published. Those that are provided in plink format and in matrices with genotypes coded as 0 1 2 are absolutely fine. However, some of the matrices have genotypes coded as nucleotides (A G T C), and I am struggling to find a conversion tool for these that works. In many occasions it is not viable for me to manually convert these datasets into plink format because I don't always have all the necessary data (e.g., often there is just a matrix with the genotypes and no other information).
Has anyone had any success with a package, or otherwise does anyone know a function that I could use to code the A G C T matrix into 0 1 2?
Thanks in advance!
In your comment on SO you say "checkATCG function called by recodeSNPs rejects a matrix with anything other than A T G and C", even tho the docs say it should work. Maybe you could contact the author for clarification.
Thanks -- I emailed the author a few days ago (I've also emailed the author for the snpReady package). If/when they reply, I'll update my post on SO and here.