I will use 23andme raw data in snprelate, my data is like that, and there 500 individuals:
# rsid chromosome position genotype
rs4477212 1 82154 AA
rs3094315 1 752566 AA
rs3131972 1 752721 GG
But I need to first convert it to gds format.
I should denote snp.id, sample.id, snp.position etc. and also create genotype
add.gdsn(newfile, "sample.id", sample.id)
add.gdsn(newfile, "snp.id", snp.id)
add.gdsn(newfile, "snp.position", snp.position)
add.gdsn(newfile, "snp.allele", c("A/G", "T/C", ...))
var.geno <- add.gdsn(newfile, "genotype",
valdim=c(length(snp.id), length(sample.id)), storage="bit2")
What I understand is sample.id is the vector of all the user ids, snp.id is the vector of all snps and so on. So, in genotype part how would I indicate that user x's snp id y is AA? What kind of a matrix is it?
My second question is how should I compute reference alleles, should I compute it on my 500 people population or should I check them from somewhere else, if its where do you suggest?
Thank you so much.