Entering edit mode
7.1 years ago
hermathena
▴
40
Hi,
Is anyone aware of a recent comparison of various variant callers (GATK, FreeBayes, etc) for non-model organisms, please? There are many out there for human data, obviously because there are good reference sets. My data is hundreds of whole genomes from an insect species (>10% sites variable!), and we traditionally use GATK. However, the GATK HaplotypeCaller is rather slow for this data. Sensitivity is a higher concern than precision (not looking for specific SNP associations).
Just FYI, the Broad is about to release GATK4, stating that notably improvements in speed were made. Maybe it is worth trying the beta-release of GATK4 and see if it performs well for your task?
Thanks for this. I have experimented wth GATK v4 Beta. There are some gains in speed through multithreading. Unfortunately, there are also many bugs that crop up. Broad is recommending not using GATK4 with Spark for now. That horrible Queue parallelisation is gone, but now you need to use something called GenomicsDBImport to merge gVCFs - and that needs to operate separately on each scaffold... For the time being one may as well use GATK3. There is still the benefit of the InDel model.