There are many insertions with identical start and end positions in an SV VCF produced by Manta. Are these inversions? I am not sure what these variant entries represent and many of them have long SV lengths reported.
Please include an example of such an SV. There are many possible reasons. IIRC for manta they're likely to be either insertions or calls in breakpoint notation.
The second example is a 'normal' insertion which manta provides the full, exact sequence of the insertion in the ALT column.
The first example is of an insertion longer than the library fragment size. Manta uses it's own custom LEFT_SVINSSEQ and RIGHT_SVINSSEQ fields that give you sequences at the start and the end of the inserted sequence but it's too long for manta to assemble so it can't report the full sequence nor does it report an SVLEN (since it only know the length is longer than what it can assemble).
No, they are insertions. END is defined in version 4.3 of the VCF specifications as:
End reference position (1-based), indicating the variant spans positions POS–END on reference/contig CHROM. Normally this is the position of the last base in the REF allele, so it can be derived from POS and the length of REF, and no END INFO field is needed. However when symbolic alleles are used, e.g. in gVCF or structural variants, an explicit END INFO field provides variant span information that is otherwise unknown.
A clean insertion is expected to have an identical POS and END. This is as expected.
I am not sure what these variant entries represent and many of them have long SV lengths reported.
For insertions, SVLEN is the number of inserted bases.
Please include an example of such an SV. There are many possible reasons. IIRC for manta they're likely to be either insertions or calls in breakpoint notation.
Thanks for your reply.
Ok so here is an example with identical start and end positions without an SV length:
and another with an SV length provided:
The second example is a 'normal' insertion which manta provides the full, exact sequence of the insertion in the ALT column.
The first example is of an insertion longer than the library fragment size. Manta uses it's own custom LEFT_SVINSSEQ and RIGHT_SVINSSEQ fields that give you sequences at the start and the end of the inserted sequence but it's too long for manta to assemble so it can't report the full sequence nor does it report an SVLEN (since it only know the length is longer than what it can assemble).