Where can I download "whole bacteria database"?
1
0
Entering edit mode
8.3 years ago

I need a database for whole bacteria genome to mapping. And It should be unique.

I have try the ncbi ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt but this class is not unique.

RNA-Seq • 4.4k views
ADD COMMENT
0
Entering edit mode

How would you define unique? All the proteins are exactly similar? 95% identity of 95% of the proteome? 80%? Do you see where I'm going here?

ADD REPLY
0
Entering edit mode

I want to find a database like ncbi Nt. But its not for bacteria. So I try ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/assembly_summary.txt But its have a lot of repeat. So I try to find a no repeat one, this is the unique I mean.

ADD REPLY
0
Entering edit mode

So you want to remove 100% identity. Do you need the transcripts or whole genomes? I think that there is no easy way to select a subset of the assemblies but you can download all of them and in a smart way screen duplicates (maybe using mummer or similar software)

ADD REPLY
0
Entering edit mode

I need whole bacteria genomes. Its there any way to download it?

ADD REPLY
3
Entering edit mode
8.3 years ago
Sej Modha 5.3k

You might find this post useful.

ADD COMMENT
0
Entering edit mode

Ya! I did use this. But its not unique. It has many repeat.

ADD REPLY
1
Entering edit mode

There is no non-redundant database of bacterial genomes. You would need to make one yourself. While there may be a few repeats many of the genomes are likely different strains of a particular species and so may appear redundant.

ADD REPLY
0
Entering edit mode

Can you teach me how to make one? tks

ADD REPLY
1
Entering edit mode

Are you only looking for "refseq" genomes (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt actual data will be in ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/ ) or all bacterial genomes?

In any case this would involve getting a list of available genomes following @5heikki's receipe (C: Download All The Bacterial Genomes From Ncbi ). Then parsing that list to make a non-redundant set. Downloading the fasta genomes for that set and making blast indexes.

ADD REPLY
0
Entering edit mode

I did use the @5heikki's. And how to do "Then parsing that list to make a non-redundant set."? I want to use bowtie2.

ADD REPLY

Login before adding your answer.

Traffic: 5724 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6