Hi,
I'm working on QTL mapping of diploid plants and succeeded in first GATK haplotypecaller run.
Results are about 300 VCF files, each containing a single sample of the plant population.
I found out that hard-filtering and BQSR are recommended after first haplotypecaller run, and thus I'm planning to run VariantFiltration command line of GATK.
My question is that, if each raw VCF file is per-sample file, is it recommended to run further procedures by merging all files into a single file, or do I have to go on with separate single-sample files?
Additionally, now every file has a different header and will it be helpful to go on for merged, multi-sample single VCF file for further studies?
Waiting for help,
Thank you.
300 VCF files should be called using the GVCF mode followed by CombineGVCFs and GenotypeGVCFs
Does it mean that before going through combining and final genotyping, all procedures regarding handling VCF files (ex: hard-filtering, BQSR) should be conducted with separate single-sample file, using -ERC GVCF option in every procedure?