Parse NCBI XML sample information into a data frame in R
1
0
Entering edit mode
5.7 years ago
willnotburn ▴ 50

There are packages out there that manipulate XML in R. But I've spent two full days and have largely given up. There must be an existing tool to simply take a downloaded sample metadata file (for all samples in an NCBI project) from NCBI (xml format) and convert it into a data frame in R. No search or modifications needed. I would greatly appreciate help on this.

R NCBI XML • 3.0k views
ADD COMMENT
1
Entering edit mode
5.7 years ago
Ahill ★ 2.0k

Have you tried R libraries XML and xml2 or rentrez? They provide methods for converting XML to data.frame or other R structures. They can require some time investment depending on the complexity of your XML but they can work.

What NCBI xml format are you parsing? Given this XML output for 8 Biosamples:

https://www.ncbi.nlm.nih.gov/biosample?LinkName=bioproject_biosample_all&from_uid=356160

XML::xmlToDataFrame() gives this:

> df <- xmlToDataFrame("biosample_result.xml")
> str(df)
'data.frame':   8 obs. of  8 variables:
 $ Ids        : Factor w/ 8 levels "SAMD00020611DRS036183",..: 8 7 6 5 4 3 2 1
 $ Description: Factor w/ 4 levels "Human diploid fibroblasts TIG-3 expressing hTert/SV40/c-MycHomo sapiens",..: 3 3 2 2 1 1 4 4
 $ Owner      : Factor w/ 1 level "Kyushu University Medical Institute of Bioregulation, Kyushu University Nakayama Lab.": 1 1 1 1 1 1 1 1
 $ Models     : Factor w/ 1 level "Generic": 1 1 1 1 1 1 1 1
 $ Package    : Factor w/ 1 level "Generic.1.0": 1 1 1 1 1 1 1 1
 $ Attributes : Factor w/ 8 levels "TIG-3 Parental_A2012-08-10TIG-3Fibroblast",..: 4 3 8 7 6 5 2 1
 $ Links      : Factor w/ 1 level "356160": 1 1 1 1 1 1 1 1
 $ Status     : Factor w/ 1 level "": 1 1 1 1 1 1 1 1

Lower level attributes like 'Attributes' are aggregated in this approach, you'd want to descend into those attributes if you need to decompose them.

ADD COMMENT

Login before adding your answer.

Traffic: 2229 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6