Question

using snprelate on 23andme dataset

1

Entering edit mode

9.6 years ago

abims ▴ 10

I will use 23andme raw data in snprelate, my data is like that, and there 500 individuals:

# rsid    chromosome    position    genotype
rs4477212    1    82154    AA
rs3094315    1    752566    AA
rs3131972    1    752721    GG

But I need to first convert it to gds format.

I should denote snp.id, sample.id, snp.position etc. and also create genotype

add.gdsn(newfile, "sample.id", sample.id)
add.gdsn(newfile, "snp.id", snp.id)
add.gdsn(newfile, "snp.position", snp.position)
add.gdsn(newfile, "snp.allele", c("A/G", "T/C", ...))

var.geno <- add.gdsn(newfile, "genotype",
    valdim=c(length(snp.id), length(sample.id)), storage="bit2")

What I understand is sample.id is the vector of all the user ids, snp.id is the vector of all snps and so on. So, in genotype part how would I indicate that user x's snp id y is AA? What kind of a matrix is it?

My second question is how should I compute reference alleles, should I compute it on my 500 people population or should I check them from somewhere else, if its where do you suggest?

Thank you so much.

reference-alleles 23andme gds snprelate • 2.7k views

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by abims ▴ 10

Ram · Answer 1 · 2015-04-29

0

Entering edit mode

9.6 years ago

Neilfws 49k

First part of question: to convert to GDS, I would try first converting the 23andme data to VCF. A few tools claim to do this; the best I've found is here.

Then you can try snpgdsVCF2GDS() in the SNPRelate package to convert VCF to GDS.

ADD COMMENT • link updated 2.4 years ago by Ram 44k • written 9.6 years ago by Neilfws 49k