I used the CFSAN SNP Pipeline to generate a SNP distance matrix for my bacterial isolates using a reference sequence.
I am wondering how to interpret the output when my matrix tells me that two isolates have a genetic distance of 1 SNP. Surely this cannot mean that across the whole genome there is only one base where they differ. I know this because the reads for my isolates do not cover every single nucleotide in the genome. Then are these SNPs based off of specific alleles? If so how many different bases/alleles are used and by what logic are they chosen?
If anyone could explain simply how these matrices are made it would be greatly appreciated!