This is a follow up on this question http://biostar.stackexchange.com/questions/17333/is-there-an-r-library-similar-to-libraries-like-bioperl-biopython-or-bioruby-m
This is a problem in R using XML package. I have 2 pubmed articles and I need to select only certain IDS. Only from certain databases I can not crack how to specify search by element value using XPath in R.
Here is my code:
#this PMID has has GOE IDs
url1="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21558518&retmode=xml"
#this PMID has has Clnical Trials
url2="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21830967&retmode=xml"
xml1 = xmlTreeParse(url1,useInternal = T)
xml2 = xmlTreeParse(url2,useInternal = T)
ns1 <- getNodeSet(xml1, '//DataBank/DataBankName')
ns2 <- getNodeSet(xml2, '//DataBank/DataBankName')
ns1
ns2
I need to modify the XPath to only select where DataBankName is (='ClinicalTrials.gov' or ='ISRCTN') URL which shows ISRCNT is this one
url3="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=21675889&retmode=xml"
I need the IDs from the element stored in accession list:
(ns <- getNodeSet(xml1, '//DataBank'))
It looks like this:
<DataBank>
<DataBankName>GEO</DataBankName>
<AccessionNumberList>
<AccessionNumber>GSE25055</AccessionNumber>
<AccessionNumber>GSE25065</AccessionNumber>
<AccessionNumber>GSE25066</AccessionNumber>
</AccessionNumberList>
</DataBank>
I tried several ways how to match XPath based an element value but could not solve it. (any other solution, bypassing XPath is fine too)
Here is what I need (but it gives me error)
ns <- getNodeSet(xml1, '//DataBank/DataBankName[text()="ClinicalTrials.gov" or text()="ISRCTN"]/../AccessionNumberList/AccessionNumber')
Yes, tried it: returns an XMLNodeSet with the 2 accession numbers (from xml3).
Thanks.Yes. that is smart. does not require backtracking to the parent. I did not see that in any XPath examples on the net.