Question

How To Download Full Genome Sequence

0

Entering edit mode

13.5 years ago

Palu ▴ 250

Hi,

I want to download the all genes of fully sequenced genomes. Is there any easy way to do that?

Thanks
palu

genome sequence • 24k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 13.5 years ago by Palu ▴ 250

3

Entering edit mode

Can you clarify your question. First, do you want full genome sequence, as your title suggests, or genes as the text suggests. Second, as you may know, there are now thousands of "fully sequenced genomes", so you may want to narrow it down to a certain subset. (unless it's a pretty specific subset that you want, the answer to your question as is, simply: no).

ADD REPLY • link 13.5 years ago by brentp 24k

0

Entering edit mode

Use the internet, silly! :o)

ADD REPLY • link 13.5 years ago by Martin A Hansen 3.0k

0

Entering edit mode

actually i want to download genomes sequences of those organisms whose genomes are completely sequenced.

ADD REPLY • link 13.5 years ago by Palu ▴ 250

score 6 · Answer 1 · 2011-06-15

6

Entering edit mode

13.5 years ago

hadasa ★ 1.0k

Different genomes have been sequenced by different institutes with different motivations and interests. As such there is no single site where you can find all the genome information that you may want.

Thus said the NCBI is a good place to start as they curate GenBank database whose contents get mirrored and exchanged with other meta-genomic warehouses such as EMBL and DDBJ.

Please have a look at this as well http://www.ncbi.nlm.nih.gov/sites/genome and this to download genome data for various organisms. ftp://ftp.ncbi.nlm.nih.gov/genomes/

I would suggest you refine your question to be more specific.

ADD COMMENT • link 13.5 years ago by hadasa ★ 1.0k

1

Entering edit mode

Well, most downloads occur "one by one". If you want downloads to run unattended, you simply use an FTP site with a command such as "mget", or an rsync server, or write a small shell script.

ADD REPLY • link 13.5 years ago by Neilfws 49k

0

Entering edit mode

actually if i want to download the genome sequence for 200 organisms,for example, then it would not be wise to do so one by one. there i am looking for any convenient way to do so

ADD REPLY • link 13.5 years ago by Palu ▴ 250

score 2 · Answer 2 · 2017-02-10

Hi,

I also struggled to find a standardized way to automate the genome retrieval process for subsequent data analysis or pipelining for genomics studies. So I sat down and wrote an R package named biomartr to fulfill this task.This way, not every study uses its own home-made shell script to retrieve genomes (which is hard to reproduce if those scripts are not made publically available).

If you really wish to download all available genes for all sequenced genomes (and here I assume that you mean in form of coding sequences (CDS) or protein sequences), the biomartr package includes the following functionality:

For example, if you would like to download CDS files and proteome files for all species available in the NCBI RefSeq database, you will find that to date there is data available for almost 8000 fully sequenced species:

biomartr::listKingdoms(db = "refseq")

Archaea Bacteria Eukaryota Viroids Viruses
  78  1627    425      46   5703
  

To now download CDS for all ~8000 species you can type:

# download all CDS stored in RefSeq
biomartr::meta.retrieval.all(db = "refseq", type = "CDS")

To download all protein sequences for all ~8000 species you can type:

# download all proteomes stored in RefSeq
biomartr::meta.retrieval.all(db = "refseq", type = "proteome")

Alternatively, you can download the entire NCBI RefSeq database by typing:

# download the entire NCBI refseq (protein) database
biomartr::download.database.all(db = "refseq_protein")

For more details about downloading specific genomes from specific kingdoms or subkingdoms of life please consult the Genomic Sequence Retrieval vignette of the biomartr package. For metagenome downloads, please consult the Meta-Genome Retrieval vignette and for entire database retrieval the Database Retrieval vignette.

Please note that to promote computational reproducibility in genomics and metagenomics studies, biomartr stores log files for each downloaded genome, proteome, or CDS file.

An example log file looks as follows:

File Name: Homo_sapiens_genomic_refseq.fna.gz

Organism Name: Homo_sapiens

Database: NCBI refseq

URL: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.35_GRCh38.p9/GCF_000001405.35_GRCh38.p9_genomic.fna.gz

Download_Date: Sat Oct 22 12:41:07 2016

refseq_category: reference

genome assembly_accession: GCF_000001405.35

bioproject: PRJNA168

biosample: NA

taxid: 9606

infraspecific_name: NA

version_status: latest

release_type: Patch

genome_rep: Full

seq_rel_date: 2016-09-26

submitter: Genome Reference Consortium

I hope that this new functionality provided by biomartr might be useful for your application and for other genomics projects.