Where can I find genome of a single bacteria, e.g. of E.coli? I downloaded rna-seq reads of E.coli from SRA and now I would like to align it using BWA to the genome of E.coli.
Where can I find genome of a single bacteria, e.g. of E.coli? I downloaded rna-seq reads of E.coli from SRA and now I would like to align it using BWA to the genome of E.coli.
Ensembl would be my first bet. Odds are good you mean this one in particular, though there are many other substrains that have been sequenced.
I have one fasta file from that strain, but also this one: http://www.ncbi.nlm.nih.gov/sra/?term=SRR1187101
Is there some way to know if this strain's genome is sequenced, besides checking manually al sources? Would it be ok if I just aligned to tthe strain you proposed?
Thanks!
(N.B., I don't work on E. Coli so take this with an appropriately sized grain of salt!) Yeah, I'd go ahead and align it to the aforementioned reference. Perhaps then take the sequence of a gene that has a lot of differences vs. the reference and then blast that to see if perhaps there's a closer strain if you really want. At the end of the day, it really depends on what your goals are. The original study that you just linked to was looking at strain sequence association to a clinical phenotype, so in many ways the exact reference strain used may not have been that important.
BTW, you might also consider de novo or reference based assembly.
Lots of places.
All easily found via a web search for "bacterial genomes database".
I know that this question is already almost 3 years old, but I hope that my answer might be useful to others anyway.
I implemented a standardized way to automate the genome retrieval process in R (see biomartr package).
To retrieve a bacterial reference genome from several database sources using only the scientific name of the bacteria of interest one can simply type:
# download Escherichia coli reference genome from NCBI RefSeq
biomartr::getGenome(db = "refseq", organism = "Escherichia coli")
or
# download Escherichia coli reference genome from NCBI Genbank
biomartr::getGenome(db = "genbank", organism = "Escherichia coli")
In case you wish to download all available bacterial genomes at once, simply type:
# download all bacterial reference genomes from NCBI RefSeq
biomartr::meta.retrieval(kingdom = "bacteria", db = "refseq", type = "genome")
For more details about downloading specific genomes from specific kingdoms or subkingdoms of life please consult the Genomic Sequence Retrieval vignette of the biomartr package. For metagenome downloads, please consult the Meta-Genome Retrieval vignette and for entire database retrieval the Database Retrieval vignette.
Please note that to promote computational reproducibility in genomics and metagenomics studies, biomartr stores log files for each downloaded genome, proteome, or CDS file.
An example log file looks as follows:
File Name: Escherichia_coli_genomic_refseq.fna.gz
Organism Name: Escherichia_coli
Database: NCBI refseq
Download_Date: Wed Feb 15 15:17:50 2017
refseq_category: reference genome
assembly_accession: GCF_000005845.2
bioproject: PRJNA57779
biosample: SAMN02604091
taxid: 511145
infraspecific_name: strain=K-12 substr. MG1655
version_status: latest
release_type: Major
genome_rep: Full
seq_rel_date: 2013-09-26
submitter: Univ. Wisconsin
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Single bacterium; single bacterial species;
single bacteria.