I have several bacterial draft genomes assembled by spades. After checking with checkM, I wanna annotate these bacteria which are with 'contaminatin' < 5%, in order to find out their species information. I aligned the draft genome to nt database, but I found one particular genome could be assigned to different bacteria. I am afraid that maybe this approach is not rational. Is there any approach competent in quick bacteria annotation. Thanks in advance.
Are these mixed samples (e.g. metagenomes)? Are you expecting contamination? If you have single, clean draft genome assemblies, prokka is a tool of choice for annotation.
Yes, my draft genomes were derived from metagenomes, and I try to split them into single bacteria. Afterward, I used checkM to determine whether it is clean. Each separated bacterium has hundreds of contigs. So now I want to know what the separated bacteria is.
Assuming that you have a computer with at least 128 Gb RAM (or a combo of RAM+swap > 128 Gb), the most consistent way of doing this is by using a GTDB toolkit. Below is an example of the final output for each bin that is more than 10% complete (it is truncated on the right side so as to not run too far out). As you can see, most metagenomic bins are classified down to the family or genus level, with couple of them having a species designation.
As Mensur said GTDB can be used to determine taxonomic affiliation of your bins. In case you do not have 128 Gb of RAM, GTDB has been implemented in KBase
Can you clarify what
means?
Are these mixed samples (e.g. metagenomes)? Are you expecting contamination? If you have single, clean draft genome assemblies,
prokka
is a tool of choice for annotation.Yes, my draft genomes were derived from metagenomes, and I try to split them into single bacteria. Afterward, I used
checkM
to determine whether it is clean. Each separated bacterium has hundreds of contigs. So now I want to know what the separated bacteria is.I would try using a dedicated metagenomic annotation and binning pipeline such as: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03585-4
Though I've never tried it myself, so I can't vouch for it.