There's a second quality score in the file as well, usually. If your file has a column like this "GT:PL:GQ", then there's another column that looks something like this "1/1:255,255,0:99". That third score is also a quality score, it scales from 1-99.
The GQ is defined as "a phred quality -10log_10p(genotype call is wrong). So again, a high number means that the SNP is likely to be real.
Which of the two is better to use, that I don't know. The QUAL score behaves differently if you have multiple .bams in your vcf file, while each .bam file will have its own GT:PL:GQ at each putative polymorphic locus.
I've done just a bit of sanger confirmation SNPs called in an exome capture projects, and, for what it's worth, I found that most of the entries that were called a homozygous SNP in the vcf, even with poor GQ's, turned out to be real. But again, with exome capture, I was expecting a good number of SNPs, and I was counting SNPs that were off target, and therefore expected to be low coverage. On a sample where the whole sample is expected to be well covered, like a whole microbial genome, the low-quality SNPs might be less likely to be real.
I'd also consider looking at the DP4. Coverage in some cases is a good proxy for quality.
Page about interpreting phred: https://gatk.broadinstitute.org/hc/en-us/articles/360035531872-Phred-scaled-quality-scores