As recommended in the GATK best practices the Variant Quality Score Recalibration has to be done separately for SNPs and Indels. But, I didn't find the way to do this split in a clean way (for instance vcftools). Does anybody know a tool to do this?
I already found a script that does the trick but I am surprised that this functionality is not included in the usual tools for processing VCF files.
As an update, in my use cases, VCFtools aren't able to process my vcf files and will report that there are some errors. Specifically, the error is because Polyploidy was found, and it wasn't currently supported by vcftools.
--keep-only-indels
--remove-indels
Include or exclude sites that contain an indel. For this option 'indel' means any variant that alters the length of the REF allele.
This functionality is relatively new, so if can't use these options on your computer, it means that you are using an old version of vcftools.
Regarding the python script I posted it does not work well in case of having many SNPs at the same position.
REF=A ALT=C,G is recognised as an indel while it is actually two SNPs.
As an update, in my use cases, VCFtools aren't able to process my vcf files and will report that there are some errors. Specifically, the error is because Polyploidy was found, and it wasn't currently supported by vcftools.
Please do not add an answer unless it answers the top level post. This post is better suited as a comment, and I am moving it to one.