Entering edit mode
2.5 years ago
WouterDeCoster
47k
Hi,
I'm trying to figure out how the INDEL type is defined in bcftools stats. With some help I found the bcf_get_variant_types function (https://github.com/samtools/htslib/blob/958e6fa708d1914bc46d9f8e9411987402468153/vcf.c#L4247), but it is C and I can't figure out when a variant is considered an indel.
I mostly wonder about when a variant is considered and indel vs structural variation, for which the length cutoff is usually at 50bp.
if the type hasn't been already defined, the type is set as an array of bytes using
OR
and this function: https://github.com/samtools/htslib/blob/958e6fa708d1914bc46d9f8e9411987402468153/vcf.c#L4162Yes, but I cannot figure out in which cases the INDEL type is assigned:
so looking quickly an indel is when this is not a symbolic allele (<BND>) and the length of REF and the current ALT are not the same.
But it requires that there is a match between a part of the REF allele and a part of the ALT allele (https://github.com/samtools/htslib/blob/958e6fa708d1914bc46d9f8e9411987402468153/vcf.c#L4215) to become an INDEL, and if there is no match then an OTHER is assigned?
In the VCF that I'm looking at the REF allele for an insertion is an N, and similarly for the ALT allele for a deletion.