Combining snp and indel vcf files with GATK4
1
1
Entering edit mode
6.2 years ago
seta ★ 1.9k

Hi all,

As the first experience and test, I performed variant calling on the single bam file (from human genome sequencing) using Haplotypecaller within GATK (version 4). For combining snp and indel files after hard filtering, I found that CombineVarints from GATK (version 3) worked well, however, it is not available in the version 4. So, GATK suggests using MergeVcf instead of CombineVarints. But, when I checked the total count of variant after using MergVcf, it was not correct. (In fact, the count was not the sum of the counts in the snp and in the indel files). I tried SortVcf with the below simple command:

gatk SortVcf –I snp.vcf –I indel.vcf –o combined.vcf

Using the above command, the total count of variant in the combined.vcf file was the sum of the counts in the snp and indel files. So, the command for combining the snp and indel files sounds right. However, I’m not sure about it. Could you please let me know if it is a correct approach? Please kindly let me know if I should consider anything for the analysis.

Thanks a lot

varint calling GATK4 combin variant SortVcf • 3.9k views
ADD COMMENT
0
Entering edit mode
6.2 years ago

My guess (easily verified) is that you have some overlapping SNPs and indels that are being merged. I don't know which takes precedence in MergeVcf - v3 CombineVariants allowed the option to specify which one. SortVcf would not resolve overlaps but merely but put them in chromosome/position order, thereby preserving the same number of variants as the two individual VCFs.

ADD COMMENT
0
Entering edit mode

Thank you for your reply. Knowing which snp and indel overlapped and also specifying which one is retained is important, isn't it? If yes, so do you recommend to use MergeVcf for combining snp and indel files and doing the rest of analysis? or using SortVcf is OK ?

ADD REPLY

Login before adding your answer.

Traffic: 3701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6