Question

All completed genomes for a phyla

0

Entering edit mode

9.1 years ago

frenchmytoast112 ▴ 10

Hello everyone,

is it possible to find all the completed genomes for certain phyla by obtaining their accession number?
I found some questions and answer here about it, but they all seem outdated or unclear. For example, I would like to access all the genomes for Acidobacteria from NCBI, which are finished and completed.

genome • 2.7k views

ADD COMMENT • link updated 9.1 years ago by Erik Wright ▴ 420 • written 9.1 years ago by frenchmytoast112 ▴ 10

2

Entering edit mode

I just used NCBI Taxonomy Database to see all sequence data for Acidobacteria (see: here). To access the list of all genomes for this phylum, click on the Assembly link.

ADD REPLY • link updated 5.2 years ago by Ram 44k • written 9.1 years ago by Andrzej Zielezinski 11k

0

Entering edit mode

Thats Assembly. Are they completed genomes? I am not sure thats what I was asking for.

ADD REPLY • link 9.1 years ago by frenchmytoast112 ▴ 10

Ram · Accepted Answer · 2016-01-21

NCBI has a list of genomes by organism. Enter "Acidobacteria" then click "Search by organism". Next, select the prokaryotes tab. Sort by the "Level" column. Completed genomes have a filled black circle.

If the list is too long to click each genome individually, there is a link near the top that says "Download selected records". This produces a file with multiple columns, including the RefSeq ftp directory address. Simply write a script to download the file(s) that you want from each of those directories.

For example, the R script I use for downloading the FASTA files looks like this:

# read in the file downloaded from the NCBI genome browser
x <- read.csv("<<PATH TO genomes_proks.csv>>", stringsAsFactors=FALSE)
ftps <- x$GenBank.FTP

# select a subset of FTPs if desired
ftps <- ftps[which(x$Level=="Complete Genome")]

# set the input and output file locations
ftps <- paste(ftps,
    paste0(sapply(strsplit(ftps, "/", fixed=TRUE), tail, n=1),
        "_genomic.fna.gz"),
    sep="/")
saveto <- paste0("~/Downloads/",
    sapply(strsplit(ftps, "/", fixed=TRUE), tail, n=1))

# download each of the genomes to ~/Downloads/
pBar <- txtProgressBar(style=3)
for (i in seq_along(ftps)) {
    download.file(ftps[i], saveto[i])
    setTxtProgressBar(pBar, i/length(ftps))
}

Hope that helps!