Hi,
I will like to carry out variant calling from fastq files (whole genomes from a large number of samples ~ several hundreds) for genetics association studies. I have come across some pipelines but not sure which one is the best for what I want to do.
Can anyone with experience in batch variant calling suggest the fast and best pipelines to help with this kind of work. Another question is: as I just want to generate genotypes based on human reference genome for association studies, and using GATK for instance, do I need to use HaplotypeCaller or MuTect for variant calling?
Any advice for batch runs for variant calling will also be welcome. Thanks
Are you aware that hundreds of WGS samples will consume several tens of terabytes for raw data alone? Do you have the computational resources to handle these amounts of data and the respective CPU/memory to align and process them?
Thanks for raising the two potential problems @ATpoint. We have a machine with 250G RAM and 5TB of Hard Drive. However, I wonder if the work is still feasible with these amounts of resources.