between using GATK HC in gVCF mode instead of in multi-sample mode?
I'm not sure what you mean, but with the possible scenarios:
- HaplotypleCaller in gVCF mode vs Just variant calling
- HaplotypeCaller (without regard to gVCF or just variant calling) vs UnifiedGenotyper per sample
I think you meant the second one. HaplotypeCaller also performs de-novo assembly of regions containing variants for more confident variant calls. Also, more info is described here: Variant Caller Of Choice?
"Cohort" is usually subjective.
- Cohort: A collection of samples being analyzed together. This organizational unit is the most subjective and depends very specifically on the design goals of the sequencing project. For population discovery projects like the 1000 Genomes, the analysis cohort is the ~100 individual in each population. For exome projects with many deeply sequenced samples (e.g., ESP with 800 EOMI samples) we divide up the complete set of samples into cohorts of ~50 individuals for multi-sample analyses.
From: http://gatkforums.broadinstitute.org/discussion/3059/lane-library-sample-and-cohort-what-do-they-mean-and-why-are-they-important
When we are doing our GATK-based pipeline, by cohorts of samples, we mean, all of the "pools". For example, we have four pools. We have induced mutation on a plant, and then fifteen plants still exhibit phenotype that as if it did not undergo mutation. We call that Pool1. The rest, Pools 2-4 with around 15 physical plants per pool, exhibit mutation at varying degrees. With the sequencing data, Pool1, 2, 3 and 4 are different samples. The cohort is all of them together.
I do think you can or not use GVCF in your analysis (WT vs mutant) - that depends on what you have further in your downstream processing. With all I have seen so far, they do use HaplotypeCaller in GVCF mode, then GenotypeGVCFs, then Variant Quality Score Recalibration which in actuality uses VariantRecalibrator and ApplyRecalibration walkers of GATK. From there you select variants with acceptable VQSLOD usually >= 4.0 . Further filtration might be needed after that.
And yes, you are correct in giving two or more BAMs to GATK.