I have a large number of samples. Like a 100 samples at least (plus these are paired end reads). I aim to call variants on these samples and then predict their effects on protein structure dynamics.
The only way that seems possible for now is to align each sample individually, pre-process them individually, call on them individually and then combine them into a gvcf for analysis.
This however, seems very time intensive and computationally cumbersome. What would be the alternatives to this ?
I'm currently using standard bash script commands and plan to use various tools, viz. GATK, freebayes, varscan 2, pindel, etc.