Entering edit mode
13 months ago
CTLong
▴
120
Hi all,
I have downloaded a series of phenotype and genotype data from dbGaP but have problems opening the data dict
files with XML
extension. What is the recommended way to parse these files?
Furthermore, is there any valuable information encrypted within these files? All I can see from it (without parsing) includes the study accession and xsl stylesheet. There isn't any information regarding the individual samples of the dataset.
What kind of information do you need in those files ?
Hi Pierre, thanks for the reply. That is what I'm trying to figure out. I suspect most of the metadata could be found in the associated text files. Just not sure if there is any valuable information for individual samples in these XML files.
well, you tell us. We don't know the XML file you're looking at. For example, I only see phenotypes in that random XML file : view-source:https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs000001/phs000001.v1.p1/archive/phs000001.AREDS.pht000001.v1.p1.datadict.xml
I think I successfully opened the XML file with Excel yesterday. Not much information to take out of these files in my case. Thanks!
excel ?? XML is just a text file. why not something like
cat
ormore
??I tried using cat and more, but the formatting looks unfamiliar so I tried opening with excel. I guess the information is the same, but it is better presented?