I'm presently working on generating structural variant (SV) calls using vg call
. I constructed the graph using whole genome assemblies using Minigraph-Cactus (MC), and aligned short-read data using vg giraffe
. I would like to compare variant calls generated with vg call
to those generate with more standard, linear reference based SV callers but I'm running into the issue that the VCF generated by VG lacks the symbology which seems necessary for certain types of SVs. It seems that some of this symbology can be generated by VG when a reference + VCF is used to construct the graph, but it's less clear to me how one might go about getting these symbols from a multiple-genome reference graph. Does anyone maybe have a suggestion about how to either create these symbols as part of the SV calling process or maybe there's another approach that could be appropriate for comparing SV calls generated by vg call
vs other, linear reference based SV callers?
can you describe what you mean by the vg vcf "lacking the symbology" for the svs? I haven't used the graph tools so not familiar myself but am curious. if it is outputting breakend style vcf things, those are tricky to deal with...
Sure, the VCF I generated with
vg call
includes a description of the variant in the ALT field in the form of a nucleotide sequence. This probably fine for insertions and deletion variants to do VCF comparisons but in the case of inversions or translocations I don't think it's so easy to do a comparison of approaches without the symbol. What I mean by symbol though is the<INV>
,<INS>
,<DEL>
that is present either in the ALT field or somewhere else in the VCF.