Dear all,
I've performed a core SNP analysis on a set of E. coli isolates and now my question is rather phylosophical. Which would you say is the cutoff for calling a sequence/clade different from the other. For example, in a more wet lab approach, while performing a PFGE, I would consider a 90% similarity to be the cutoff. With the case of SNP I believe is much more difficult to set this break-point since (Please, correct me if I'm wrong):
1) SNP phylogeny has a much higher resolution.
2) Due to higher resolution, there are inevitable inherent errors.
3) The amount of SNP that can be found is not constant in any species/isolates (Or it might be but within a significant range)
Any opinions/discussions would be very welcome!
Since you are inviting discussion I would contend that a 10% difference in sequence would no longer keep the organism E.coli. I have not worked with bacterial phylogenetics at sequence level but that seems like a huge difference.
Hey @GenoMax ! Yes, completely agree, the 10% cutoff I mentioned corresponded to PFGE in which you digest the DNA of your isolate with certain enzymes and then analyse the band pattern of each isolate as a 0/1 matrix. This method was considered the gold standard (is it still?) For typing and would render different pulsetypes. In any case, as I mentioned in my three bullet points and as you state, using a method with a much higher sensibility should need a much lower cut off for distinguishing between types/clusters of isolates within the same species. My question now is, which is this cut off value? I've been through a lot of papers and don't seem to find such a value or even a range.