Guys, for a project I'm trying to do, I need to measure which strains in my data set have the most de novo variants. Normally, laboratory experiments are necessary to perform this analysis because it is necessary to proliferate clones of a bacteria, sequence their genomes, and compare them with the genome of the initial cell. I was wondering if there is any way to carry out this analysis with public data only. For example with all E. coli genomes. Is there any statistics that can assess the probability that a variant observed in this type of analysis is de novo or inherited ?
(sorry if it's a very silly question)
Paper suggestions about the theme are welcome.