What (if any) is the official file format name that 23andme uses. I've only seen it referred to as 23andme format. Are they complying with a more general format? If so, what is it? If not, why did they invent a new format?
What (if any) is the official file format name that 23andme uses. I've only seen it referred to as 23andme format. Are they complying with a more general format? If so, what is it? If not, why did they invent a new format?
According to this page: http://fileformats.archiveteam.org/wiki/23andMe
Raw genetic data is provided in the form of a tab delimited file (ZIPped up for distribution), containing the fields rsid, chromosome, position, genotype (e.g. rs3094315 1 742429 AG).
Comment lines begin with the # character.
The file name is of the form genome_Firstname_Lastname_20012345678901.txt, zipped as genome_Firstname_Lastname_20012345678901.zip.
Parsing to VCF is indeed straightforward. I was hoping to dev an app using 23andme data but to get the data into the environment it needs to be in a known NGS format; I was hoping it was. Rather than asking those operating the platform to accept this very specific datatype, I was hoping to ask to broader, more general question of "can we include *.xxx type files." I'll see what they say. Thanks
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
An oldie but goodie…
What if the variant is a deletion or insertion, particularly of more than a nucleotide? How could you differentiate an insertion of -/TT from a T/T snp? Anybody has got a real sample file to look at?