Question

Where can I download "whole bacteria database"?

0

Entering edit mode

8.3 years ago

felix.kuo.1211 ▴ 10

I need a database for whole bacteria genome to mapping. And It should be unique.

I have try the ncbi ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt but this class is not unique.

RNA-Seq • 4.4k views

ADD COMMENT • link 8.3 years ago by felix.kuo.1211 ▴ 10

0

Entering edit mode

How would you define unique? All the proteins are exactly similar? 95% identity of 95% of the proteome? 80%? Do you see where I'm going here?

ADD REPLY • link 8.3 years ago by Asaf 10k

0

Entering edit mode

I want to find a database like ncbi Nt. But its not for bacteria. So I try ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt But its have a lot of repeat. So I try to find a no repeat one, this is the unique I mean.

ADD REPLY • link 8.3 years ago by felix.kuo.1211 ▴ 10

0

Entering edit mode

So you want to remove 100% identity. Do you need the transcripts or whole genomes? I think that there is no easy way to select a subset of the assemblies but you can download all of them and in a smart way screen duplicates (maybe using mummer or similar software)

ADD REPLY • link 8.3 years ago by Asaf 10k

0

Entering edit mode

I need whole bacteria genomes. Its there any way to download it?

ADD REPLY • link 8.3 years ago by felix.kuo.1211 ▴ 10

score 3 · Answer 1 · 2016-08-15

3

Entering edit mode

8.3 years ago

Sej Modha 5.3k

You might find this post useful.

ADD COMMENT • link 8.3 years ago by Sej Modha 5.3k

0

Entering edit mode

Ya! I did use this. But its not unique. It has many repeat.

ADD REPLY • link 8.3 years ago by felix.kuo.1211 ▴ 10

1

Entering edit mode

There is no non-redundant database of bacterial genomes. You would need to make one yourself. While there may be a few repeats many of the genomes are likely different strains of a particular species and so may appear redundant.

ADD REPLY • link 8.3 years ago by GenoMax 147k

0

Entering edit mode

Can you teach me how to make one? tks

ADD REPLY • link 8.3 years ago by felix.kuo.1211 ▴ 10

1

Entering edit mode

Are you only looking for "refseq" genomes (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt actual data will be in ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/ ) or all bacterial genomes?

In any case this would involve getting a list of available genomes following @5heikki's receipe (C: Download All The Bacterial Genomes From Ncbi ). Then parsing that list to make a non-redundant set. Downloading the fasta genomes for that set and making blast indexes.

ADD REPLY • link 8.3 years ago by GenoMax 147k

0

Entering edit mode

I did use the @5heikki's. And how to do "Then parsing that list to make a non-redundant set."? I want to use bowtie2.

ADD REPLY • link 8.3 years ago by felix.kuo.1211 ▴ 10