How to determine new bacterial species from custom isolate?
3
1
Entering edit mode
6 weeks ago

I isolated an environmental bacterium, whole-genome sequenced, obtained a consensus of the genome, and determined the closest phylogenic relative (reference).

How can I determine whether the new isolate is a species or strain with respect to the reference? What should be the percentage difference to call a new species? What programs can I use to generate such percentage?

Thank you

species genome submission • 901 views
ADD COMMENT
4
Entering edit mode
6 weeks ago
Mensur Dlakic ★ 29k

FastANI will quickly calculate the identity between two FASTA files.

https://github.com/ParBLiSS/FastANI

There is no hard consensus what constitutes a new species, but you would have a good case if the identity between your bacterium and its closest relative is < 95%.

You can try classifying the genome using GTDBtk:

https://github.com/Ecogenomics/GTDBTk

It works with GTDB to classify your genome(s) of interest. If the obtained classification is at the species level, it means what you have is similar enough to known bacterial species. If you get a classification only up to the genus level (or any level above it), that will again be a strong indicator that you have a new species.

ADD COMMENT
0
Entering edit mode

Thank you, I think the FastANI (now SKANI) solution is the fastest. BTW, I got an ANI of 98%, Align_fraction_ref of 95%, and Align_fraction_query of 99%; shall I consider only the ANI as final result?

ADD REPLY
0
Entering edit mode

Your ANI scores would indicate that you have a strain of a known species. Still, it wouldn't hurt to try GTDBtk classification.

ADD REPLY
0
Entering edit mode

Thank you, but installing GTDBtk is a bit cumbersome (90 to 500 Gb of hard disk, 32 h of downloading on my machine); is there an online engine?

ADD REPLY
0
Entering edit mode

KBase should be able to do GTDB classification. You will have to create an account to use it.

https://www.kbase.us/

ADD REPLY
2
Entering edit mode
6 weeks ago
michael.ante ★ 4.0k

Different species have different inherent mutation rates; check if your species of interest has some published values and compute if your isolate lands itself within the 'normal' variation.

Also check if your species has a MLST to check if your isolate clusters with a known strain (you can use e.g. https://github.com/tseemann/mlst ).

You can also use cgMLST (e.g. chewbbaca) to see how your isolate clusters within a set of given strains (which you can obtain from NCBI).

ADD COMMENT
0
Entering edit mode

I ran the MLST analysis against 14 genomes and I got this kind of picture enter image description here

what would be the interpretation? Thank you

ADD REPLY
0
Entering edit mode

Hey,

I don't think you can interpret this graph with so few genomes.

I'd try to load the created matrix into Phyloviz and check the tree how your samples cluster in regard to the known species.

ADD REPLY
2
Entering edit mode
6 weeks ago
5heikki 11k

mash dist -k 17 -s 50000, distance should be at least 0.03 if we go by the "classic" 97% similarity species threshold, but it's not really that simple. Anyway, if the distance is less, then it's definitely not a new species..

ADD COMMENT

Login before adding your answer.

Traffic: 1465 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6