Hello,
I have SNP Genotype data and I am looking forward to convert it to matrices of 0,1,2. I found the same question in http://www.biostars.org/post/show/15399/how-to-convert-snp-genotype-data-into-012-matrix/ but I am not able to find the answer as what package/software to use. Can someone provide the link for any such conversion tools?
I am not a bioinformatics person; I am basically a software guy. I would also like to know what are the software packages (more specifically an opensource tool written in java or python) that are good in analyzing Genomic SNP data and has reasonable documentation.
I tried to run the following code. But as per the snpMatrix class (http://svitsrv25.epfl.ch/R-doc/library/snpMatrix/html/snp-class.html) I am supposed to have the matrix as SNP class object. Can some one tell me how to convert the variable "val" into an snp matrix?
In short I want to do the inverse of what has been given as solution in this thread (http://www.biostars.org/post/show/14703/updated-r-package-to-analyse-eqtl-and-tutorials-available-for-the-association-of-genetic-variants-and-gene-expression/).
Code:
val<- read.table("C:/Users/vineeth/Desktop/Data.txt") val V1 V2 V3 1 GG AA AG 2 GG AA AA 3 AA AA GG
coerce(from=val, to="numeric",strict=TRUE) [1] "c(2, 2, 1)" "c(1, 1, 1)" "c(2, 1, 3)"
What is the format of the SNP genotype data you have? What would the 0,1,2 represent? Would 0 mean that the individual is homozygote on the reference allele? or on the ancestral allele? Or anything else? Do you want to conserve the phase of the genotypes? Note that if you convert your data to matrices of 0,1,2, you loose the phase of the data.
Sorry to be frank I don't know much in bio informatics. My job is to to just analyze data and my first phase involves running of Random Forest.
You should at least know what the 0,1,2 in our output are supposed to represent, because there is more than one possible interpretation. Ask your boss: sit down with him and write a test case. Python's doctest are good in these cases.