How to download large amount of data from NCBI?
4
0
Entering edit mode
5.2 years ago
matheus.sf • 0

Hey guys, I need really some help with database download. I need to download all the nucleotide database of archaea from NCBI (2.3kk sequences). Basically, when I tried to download directly from the site itself (Send to > Complete Record...) simply doesn't work at all, the download always failed, I tried like 5 times before finally give it up. Then I tried to download with NCBI Mass Sequence Downloader, and oh boy, that didn't work either. When I click in Save As (after wrote the query in Search Query) the program just ends.

So now here I'm, begging to anyone who can help me, to help me. How is the simplest way to download large amount of date from NCBI? I need a detailed tutorial cause I'm cleary too dumb to do it just in feeling.

Thanks

sequence • 1.4k views
ADD COMMENT
0
Entering edit mode
5.2 years ago
Brice Sarver ★ 3.8k

It looks like you're getting some good information on your same reddit post, and my personal recommendation would be to go with the Entrez E-utilities. You can find the usage guide here.

ADD COMMENT
0
Entering edit mode
5.2 years ago
noodle ▴ 590

It seems like you're trying to do this through a web browser, which will just lead to frustration (and won't work). If you're going to be working with "big data" you'll absolutely need to learn how to use some basic "command line tools" through a terminal. I recommend to spend a few hours learning how to use NCBI interface tools via their github page, and if this sparks your interest to check out aspera and/or aws and/or google cloud download utilities.

ADD COMMENT
0
Entering edit mode
5.2 years ago
GenoMax 148k

Besides the NCBI eUtils options noted you can also take a look at Kai Blin's NCBI genome download tool. You can pass the taxID 2157 for archaea to the tool to download all archael genomes.

In any case this would be a large download and there is no way around it. If you are constrained by bandwidth/amount of data keep that in mind.

ADD COMMENT
0
Entering edit mode
5.2 years ago
natasha.sernova ★ 4.0k

Always remember - there was an old copy of NCBI before 2014,

look at my answer at the post below.

where can I get environmental bacteria genome in fasta format (as many as possible)?

There was a compact organization of genomes there.

To download archaeal or bacterial archieve was really easy.

By the way, once I saw how slowly new archaeal genomes

have been added to NCBI - old version may still be OK for

these particular species.

ADD COMMENT

Login before adding your answer.

Traffic: 1854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6