Hi all,
As the first experience and test, I performed variant calling on the single bam file (from human genome sequencing) using Haplotypecaller within GATK (version 4). For combining snp and indel files after hard filtering, I found that CombineVarints from GATK (version 3) worked well, however, it is not available in the version 4. So, GATK suggests using MergeVcf instead of CombineVarints. But, when I checked the total count of variant after using MergVcf, it was not correct. (In fact, the count was not the sum of the counts in the snp and in the indel files). I tried SortVcf with the below simple command:
gatk SortVcf –I snp.vcf –I indel.vcf –o combined.vcf
Using the above command, the total count of variant in the combined.vcf file was the sum of the counts in the snp and indel files. So, the command for combining the snp and indel files sounds right. However, I’m not sure about it. Could you please let me know if it is a correct approach? Please kindly let me know if I should consider anything for the analysis.
Thanks a lot
Thank you for your reply. Knowing which snp and indel overlapped and also specifying which one is retained is important, isn't it? If yes, so do you recommend to use MergeVcf for combining snp and indel files and doing the rest of analysis? or using SortVcf is OK ?