I can't figure out how to use the "-match" syntax even after reading all the documentation I could find. I get these errors:
$ cat xml.txt | xtract -pattern Gene-commentary -match Gene-commentary_type:1
Unrecognized argum`ent '-match'
No -element before 'Gene`-commentary_type:1'
$ cat xml.txt | xtract **-element** Gene-commentary -match Gene-commentary_type:1
Unrecognized argument '-match'
No -element before 'Gene-commentary_type:1'
What am I doing wrong?
What I am trying to do is pull the accession of the reference sequence and the coordinates for the region for a given entry in NCBI Gene (see Retrieve all FASTA RefSeq files for a given entry in NCBI gene?) so that I can run efetch -format FASTA -seqstart -seqend
and get the appropriate results.
I could parse the XML in python to do it, but it really seems like I should be able to do this in "one line" using entrez direct if only I could get -match
to work :/
Here is what the XML looks like:
Say I have a gene record in XML
epost -db gene -id 672 | efetch -format xml > xml.txt
According to the outline,
cat xml.txt | xtract -outline
<Gene-commentary>
<Gene-commentary_type value="genomic">1</Gene-commentary_type>
<Gene-commentary_heading>Reference assembly</Gene-commentary_heading>
<Gene-commentary_label>RefSeqGene</Gene-commentary_label>
<Gene-commentary_accession>NG_005905</Gene-commentary_accession>
<Gene-commentary_version>2</Gene-commentary_version>
<Gene-commentary_seqs>
<Seq-loc>
<Seq-loc_int>
<Seq-interval>
<Seq-interval_from>92500</Seq-interval_from>
<Seq-interval_to>173688</Seq-interval_to>
I have read:
http://www.ncbi.nlm.nih.gov/books/NBK179288/ (I followed these instructions to install it, so which epost
returns ~/edirect/epost
)
http://www.ncbi.nlm.nih.gov/news/02-06-2014-entrez-direct-released/?campaign=facebook-02072014
http://elane.stanford.edu/laneconnex/public/media/documents/EntrezDirect.pdf
Thanks for answering my specific question! It's weird their error message says "before" :/ I wasn't sure who to give the checkmark to, but I think Pierre Lindenbaum answered my actual question.