Attempting To Utilise The New Entrez Direct Package But Having Difficulty With Pubmed And Nucleotide Xml Parsing
1
0
Entering edit mode
10.8 years ago
Daniel ★ 4.0k

The new tool appears to do exactly what I want and I was keen to try it out, but I'm having some dificulty.

I am attempting to pull out sequences from a taxon tree matching a single gene and give me a table of Accession, Author(s), Affiliation, Title. This is to give to collaborators for them to authenticate trusted sources, and I will pull out the chosen fasta sequences at a later date.

When parsing pubmed records the documentation is quite clear, and I can confirm it works for me:

esearch -db pubmed -query "Garber ED [AUTH] AND PNAS [JOUR]" | elink -related | efilter -query "mouse" | efetch -format docsum | xtract -pattern DocumentSummary -element Id SortFirstAuthor Title

I am attempting to search the nucleotide database, but I cannot return the 'Authors' or other details using the 'xtract' command, and I can't find any examples on doing so

My best attempt is as follows, but it only gives the Id:

esearch -db nucleotide -query "txid2836[Organism:exp] AND rbcl[GENE]" | efetch -format docsum | xtract -pattern DocumentSummary -element Id Authors

Alternatively, I have been attempting to use efetch -format xml and xtract-ing the information from there, but I can't understand how to select the correct hierarchy level (documentation):

The xtract function is used for processing XML data:

Exploration Argument Hierarchy
-pattern       (Highest Rank)
-division
-group
-branch
-block
-section
-subset
-unit          (Lowest Rank)

One such attempt looks like this:

esearch -db nucleotide -query "txid2836[Organism:exp] AND rbcl[GENE]" | efetch -format xml | xtract -division Authors -unit Name
entrez eutils xml parsing • 5.0k views
ADD COMMENT
0
Entering edit mode
10.8 years ago
Neilfws 49k

Currently, I'm unable to get your second example to run to completion. So I'm trying a simpler query:

esearch -db nucleotide -query "NM_182762.3"

The first step is to run xtract with the -outline option, to see what is in the XML:

esearch -db nucleotide -query "NM_182762.3" | efetch -format xml | xtract -outline > ed.out

If you examine the file ed.out, you will see the hierarchy for an author:

Author
    Author_name
        Person-id
            Person-id_ml

Running xtract again: -pattern is the element in the hierarchy "one level" above the -element that you want:

esearch -db nuccore -query "NM_182762.3" | efetch -format xml | xtract -pattern Person-id -element Person-id_ml | head -10

Ren B
Zakharov V
Yang Q
McMahon L
Yu J
Cao W
Xie C 
Wu J
Yun J
Lai J

It's worth spending some time with the complete edirect documentation as opposed to the simplified introduction.

ADD COMMENT

Login before adding your answer.

Traffic: 1591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6