How to correctly use E-fetch from E-utilities?
1
0
Entering edit mode
7.4 years ago
jaynaythan ▴ 10

Hi,

I am looking to download metadata from the SRA.

I downloaded E-utilities following these instructions

https://www.ncbi.nlm.nih.gov/books/NBK179288/

However, i cannot get E-fetch to work.

I have a feeling i am using a differetn version of E-fethc to others. here is the 'help' page from my E-fetch

 EFETCH - retrieve entries from sequence databases.

  Synopsis: efetch -options [database:]<query>

  Databases:  SWissprot/SP, PIR, WOrmpep/WP, EMbl, GEnbank/GB, ProDom, ProSite

  Options:
    -a            Search with Accession number
    -f            Fasta format output
    -q            Sequence only output (one line)
    -s <#>        Start at position #
    -e <#>        Stop at position #
    -o            More options and info...

    -D <dir>      Specify database directory
    -H            Display index header data
    -p            Display entrynames in search path
    -r            Print sequence in 'raw' format
    -m            Fetch from mixed mini database
    -M            Mini format output
    -b            Do NOT reverse the order of bytes
                              (SunOS, IRIX do reverse, Alpha not)
    -d <dbfile>   Specify database file (avoid this)
    -i <idxfile>  Specify index file (avoid this)
    -l <divfile>  Specify division lookup table (avoid this)
    -B <database> Specify database (archaic)
    -A            Only return entryname for accession number
    -n <name>     Give the sequence this name
    -x            Don't require query to match entry's name exactly (avoid)
    -w            For Wormpep: also fetch cross-referenced SwissProt entry
    -h            shows this help text


  Environment:   SWDIR      = SwissProt  directory - database and EMBL index files   PIRDIR     = PIR        -- " --   WORMDIR    = Wormpep  
-- " --   EMBLDIR    = EMBL       -- " --   GBDIR      = Genbank    -- " --   PRODOMDIR  = ProDom     -- " --   PROSITEDIR = ProSite    -- "
--   DBDIR      = User's own -- " -- (fasta format)

  SEQDB    database file (default SwissProt)   SEQDBIDX index file   DIVTABL  division lookup table

  Ex. setenv DBDIR /pubseq/seqlibs/embl/

  Note that Prodom family consensus seqs can be fetched by PD:_#

  by Erik Sonnhammer (esr@sanger.ac.uk)   Version 2.1,

There is no mention of the command -format which appears in commands online. for example these do not work for me.

esearch -db pubmed -query "lycopene cyclase" |efetch -format abstract

esearch -db sra -query SRR5070677 | efetch -format runinfo

the efetch fails but esearch works fine.

Could anyone help me out?

ncbi E-utilities linux sra • 7.2k views
ADD COMMENT
0
Entering edit mode

I am getting the exact same issue. I installed the EDirect by following the steps given in the manual in NCBI. My version is " Version 2.1, Dec 13 2015". Is there a latest file which can be downloaded ?

ADD REPLY
1
Entering edit mode
7.4 years ago
Sej Modha 5.3k

I am not sure if you have installed the eutils properly, both commands mentioned above works for me.

ADD COMMENT
0
Entering edit mode

OH. yes. You are right. Not sure what went wrong

ADD REPLY
0
Entering edit mode

One nice way of installing eutils is to use Homebrew (http://brew.sh) or Linuxbrew (http://linuxbrew.sh). Then you can run brew install homebrew/science/edirect

ADD REPLY
0
Entering edit mode

Great thanks! Theres just really poor documentation of this. i would never have known!

ADD REPLY
0
Entering edit mode

Hi Sej, i wonder if you can help me out.

i am trying to download this data:

https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRR5070677&go=go

But some fields are missing from the esearch results. Namely location and host. Do you know if its possible to include this info?

ADD REPLY
1
Entering edit mode

You can extract that information with a combination of tools of edirect:

esearch -query SRR5070677 -db sra | efetch -format xml > output.xml

this produces an XML file that you can process with:

cat output.xml | xtract -pattern SAMPLE_ATTRIBUTE -element TAG -element VALUE

that, in turn will produce the output:

strain  J159
collected_by    missing
collection_date 2014
geo_loc_name    USA: MN
host    Homo sapiens
host_disease    pertussis
isolation_source    missing
lat_lon missing
BioSampleModel  Pathogen.cl
ADD REPLY
0
Entering edit mode

Hi, Runinfo file downloaded from this link would not contain the host information either. You can try fetching the data in the XML format instead and parse the information relation to host from there.

ADD REPLY

Login before adding your answer.

Traffic: 1490 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6