I have been using the multi-sample variant calling (SNP or SNV) calling method that's a part of samtools and I am not entirely certain how confident I am in it's output. The feeling is subjective - when I run each of the sample files individually and look for shared variants and differences between the samples I get a set of variants that I am fairly confident in. They have good quality, are shared by multiple samples (but not all samples), and they seem to be fairly robust (i.e. show up again and again) in spite of the changing the strictness of mapping or base quality criteria. Some of those pretty robust variants are absent from the final filtered VCF when I run the multi-sample variant calling method. There was one variant in particular that was present in all but one sample that did not get picked up by the multi-sample variant calling samtools method. I have a lot of samples and the average coverage across the genome, per sample, is quite high (~100 ; I removed the -D filter). I know the multi-sample method in samtools was optimized to run on the lower coverage data from the 1000 genome project, so perhaps part of the discrepancies stem from underlying statistical machinery being tuned for such data.
With this is mind I wanted to get a sense of what people are using for multi-sample SNP/SNV calling; whether a lot of people do use it (as opposed to calling variants on each sample separately then amalgamating the information using a script after the fact); and whether the results of samtools SNP/SNV calling, specifically, are reliable (in theory, the multi sample method should be more reliable than the single).