Can you post a short example of a unit of data from the JSON file, and a brief example of the output that you would like to get from that sample data unit?
ADD REPLY
• link
updated 2.9 years ago by
Ram
44k
•
written 10.2 years ago by
Dan D
7.4k
0
Entering edit mode
Geia sou Deedee,
JSON files are really huge but they look like this:
Each pair of letters corresponds to one locus (mostly SNPs but sometimes also indels). Double underscore corresponds to missing genotype. We need a MAP file to understand the JSON files correctly.
PED files include the following fields (one line per individual):
I see. So if "id" and "genome" are the only two properties for each data unit, then it's obviously not a translation of key-value pairs to a flat table.
I don't know of any tool that can do the processing work, but I'll check around as soon as I have time. Thanks for uploading that!
ADD REPLY
• link
updated 2.9 years ago by
Ram
44k
•
written 10.2 years ago by
Dan D
7.4k
I do not know what you mean by 4-column 23andMe format, but here a thing you can do with R in order to go from.JSONto a merged Dataframe (called in this example:dfList) that you can use after to construct your .pedand .mapfiles (I think .ped and .mapare tab delimited txt files) :
install.packages("jsonlite")
install.packages("plyr")
library(jsonlite); library(plyr)
file1<- fromJSON("/../.JSON") # you can do a for loop here to not enter all your files manually (file2, file3,..)
dfList= list(file1,file2,....) # make all your files a list named dfList
merged.file=join_all(dfList) # merge them all based on common lines.
Once joined you can manipulate these files to create a .pedand .map file.
hope this would help !
Kiz
ADD COMMENT
• link
updated 2.9 years ago by
Ram
44k
•
written 10.2 years ago by
Kizuna
▴
880
If you don't have accompanying key files for the JSONs, you'll probably need to re-grab the genomic data. See here; that includes a link to the current genome-string-index-to-variant-info file, but it's updated every once in a while. Since you mention that some of your JSONs are missing some SNPs, it sounds like they aren't all compatible with the current key.
Geia sou Yorgo,
Can you post a short example of a unit of data from the JSON file, and a brief example of the output that you would like to get from that sample data unit?
Geia sou Deedee,
JSON files are really huge but they look like this:
Each pair of letters corresponds to one locus (mostly SNPs but sometimes also indels). Double underscore corresponds to missing genotype. We need a MAP file to understand the JSON files correctly.
PED files include the following fields (one line per individual):
MAP files include the following fields:
(genetic discance is irrelevant and can be set to 0).
I was hoping that there might be some statistical package or tool that does this job instead of having to write code from scratch.
All the best,
Yorgos
I see. So if "id" and "genome" are the only two properties for each data unit, then it's obviously not a translation of key-value pairs to a flat table.
I don't know of any tool that can do the processing work, but I'll check around as soon as I have time. Thanks for uploading that!