I would like to get the full lineage tree downstream of a phylum (all the subcategories until species level) in xml format from the NCBI Taxonomy database. I tried using ncbi eutils: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id=1239&retmode=xml however that search does not provide the children of the nodes. Is there an easy way of obtaining such a file?
You could use the BioPortal SPARQL endpoint to obtain the children. The following sparql query will obtain the children, grandchildren and greatgrandchildren of your tree. You need to adapt the query to the maximum depth of the tree under scrutiny.
If you run this query in your preferred browser and copy the resulting URL, you can use that URL to iterate over the different subclasses. To programmatically submit SPARQL queries you first need to get an apikey.
The following pipeline would get what you want:
Get the children you need to provide the APIKEY and the Taxonomy ID in the URL (In brackets and capitals):
curl"http://sparql.bioontology.org/sparql?query=PREFIX+omv%3A+%3Chttp%3A%2F%2Fomv.ontoware.org%2F2005%2F05%2Fontology%23%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+skos%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2004%2F02%2Fskos%2Fcore%23%3E%0D%0ASELECT+DISTINCT+%3Ftaxonid%0D%0AWHERE+%7B%0D%0A%09%3Fchild+rdfs%3AsubClassOf+%3Chttp%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FNCBITaxon_{$TAXONID}%3E++.%0D%0A++++++++%3Fchild+skos%3Anotation+%3Ftaxonid+.%0D%0A%7D%0D%0A++++++++&apikey={YOUR API KEY HERE}"
Extract the taxon ID of each child
Use eutils to get the xml of that child
Repeat from step 1 until the full tree is processed.