Cannot get efetch to download genome - what is wrong?
3
2
Entering edit mode
6.7 years ago
BioBing ▴ 150

Hi all,

I am trying to download a genome assembly (Bioproject: PRJEB20069, assembly: GCA_900241095.1) in FASTA format using the Entrez utilities from the command line, but it continues to fail:

efetch -db=nuccore -format=fasta -id=GCA_900241095.1 > output.fa

I have tried varies things (using the BioProject number, getting information using esearch etc.) but nothing seems to work.

Can any of you see what I am doing wrong?

Thank you!

Best wishes, Birgitte

entrez genome efetch fasta • 3.9k views
ADD COMMENT
1
Entering edit mode

There is no such accession in nuccore:

$ wget  -q -O - "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccore&term=GCA_900241095.1"


https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20060628/esearch.dtd">
<eSearchResult>
  <Count>0</Count>
  <RetMax>0</RetMax>
  <RetStart>0</RetStart>
  <IdList/>
  <TranslationSet/>
  <QueryTranslation>(GCA_900241095.1[All Fields])</QueryTranslation>
  <ErrorList>
    <PhraseNotFound>GCA_900241095.1</PhraseNotFound>
  </ErrorList>
  <WarningList>
    <OutputMessage>No items found.</OutputMessage>
  </WarningList>
</eSearchResult>
ADD REPLY
2
Entering edit mode
6.7 years ago
GenoMax 147k

You may want to try this too

esearch -db bioproject -query "PRJEB20069" \
  | elink -target assembly \
  | efetch -format docsum \
  | xtract -pattern DocumentSummary -element FtpPath_GenBank \
  | xargs -n 1 sh -c 'wget "$0"/*fna.gz'
ADD COMMENT
0
Entering edit mode

This is perfect! Thank you'!

ADD REPLY
0
Entering edit mode

That's some serious ncbi-fu stuff. I've just break it into several lines just to make it more readable.

ADD REPLY
1
Entering edit mode
6.7 years ago
BioBing ▴ 150

You are right! Thanks!

I think I found a way to download it using:

esearch -db bioproject -query "PRJEB20069" | elink -target nuccore| efetch -format fasta > output.fa
ADD COMMENT
1
Entering edit mode
6.7 years ago
h.mon 35k

I think you are downloading more than just the assembly. To download only the assembly scaffolds:

esearch -db nucleotide -query "LS041563[PACC]:LS041565[PACC]" \
  | efetch -format fasta
ADD COMMENT
0
Entering edit mode

You are right! Thank you for noticing that! :)

ADD REPLY

Login before adding your answer.

Traffic: 1861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6