NCBI: dowload all genomes obtained from soil/marine/host associated bacteria/organisms
2
1
Entering edit mode
7.3 years ago
dago ★ 2.8k

Let's say I would like to download from NCBI all genomes obtained for marine bacterial (or soil or gut associated). I figured that e-utilities could work for me.

Now, to get the information concerning the environmental source I should check the biosample. So I would do something like:

esearch -db biosample -query "marine" | efetch -format tabular

1: Photobacterium sanguinicancer CAIM 1827T
Identifiers: BioSample: SAMN04252530; Sample name: CAIM1827T.1; SRA: SRS1159004
Organism: Photobacterium sanguinicancri
Attributes:
    /strain="CAIM 1827"
    /host="Maja brachydactyla"
    /isolation source="Hemolymph"
    /collection date="06-Dec-2005"
    /geographic location="Spain: Ria a Coruna"
    /sample type="Bacterium"
    /altitude="0 m"
    /biomaterial provider="Collection of Aquatic Important Microorganisms"
    /culture collection="not applicable"
    /environment biome="marine"
    /host tissue sampled="hemolymph"
    /identified by="Bruno Gomez-Gil"
    /latitude and longitude="43.21 N 8.2200 W"
    /specimen voucher="not applicable"
Description:
    Draft genome of Photobacterium sanguinicancer type strain CAIM 1827T
    Accession: SAMN04252530 ID: 4252530
.....

Now, I would like to either download this assemblies/SRA or to access them, and this is making me quite confused. As far as I can read, I could use efetch, to retrieve sequences. However, there seem to be not direct link between querying biosamples and accessing the data via e-utilities.

Is someone out there taht could illuminate me?

genome database sequence next-gen • 2.4k views
ADD COMMENT
1
Entering edit mode
7.3 years ago

there seem to be not direct link between querying biosamples and accessing the data via e-utilities.

Is someone out there taht could illuminate me?

you have to call elink https://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_ELink_

$ wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=biosample&db=taxonomy&id=6350818"

https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20101123/elink.dtd">
<eLinkResult>

  <LinkSet>
    <DbFrom>biosample</DbFrom>
    <IdList>
      <Id>6350818</Id>
    </IdList>
    <LinkSetDb>
      <DbTo>taxonomy</DbTo>
      <LinkName>biosample_taxonomy</LinkName>

        <Link>
                <Id>408172</Id>
            </Link>

    </LinkSetDb>
  </LinkSet>
</eLinkResult>

here biosample : 6350818 (is https://www.ncbi.nlm.nih.gov/biosample/?term=6350818 ) and the taxon is 408172 https://www.ncbi.nlm.nih.gov/taxonomy/?term=408172 "marine metagenome"

ADD COMMENT
0
Entering edit mode

Hi Pierre, thanks very much for the answer. So I looked into it and that's how far I got. Starting from a biosample I can get the link to the assembly for example:

esearch -db biosample -query "SAMN06971996" | elink -target assembly 
<ENTREZ_DIRECT>
  <Db>assembly</Db>
  <WebEnv>NCID_1_14264864_130.14.22.215_9001_1501173337_1595509801_0MetA0_S_MegaStore_F_1</WebEnv>
  <QueryKey>3</QueryKey>
  <Count>1</Count>
  <Step>2</Step>
</ENTREZ_DIRECT>

However, I still cannot figure out how to access the real sequence as efetch want work.

ADD REPLY
0
Entering edit mode
 ( (wget -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=biosample&term=SAMN06971996"  | xmllint --xpath '/eSearchResult/IdList/Id[1]/text()' - && echo)  | xargs -I '{}' wget -q -O-  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=biosample&db=taxonomy&id={}" | xmllint --xpath '/eLinkResult/LinkSet/LinkSetDb/Link/Id[1]/text()' - && echo ) | xargs -I '{}' wget -q -O-  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=taxonomy&id={}" | xmllint --xpath  '/TaxaSet/Taxon/ScientificName/text()' -


Candidatus Pelagibacter sp. TMED142
ADD REPLY
0
Entering edit mode

I guess you answers are too inscrutable for me to understand! :)

ADD REPLY
0
Entering edit mode
7.3 years ago
Charles Plessy ★ 2.9k

Once you have the name or identifiers of the species you want to download, have a look at the following discussion: C: Download All The Bacterial Genomes From Ncbi.

ADD COMMENT

Login before adding your answer.

Traffic: 1550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6