Hi vg team,
I followed the instructions provided in the Working with a whole genome variation graph to construct my own variation graph. After constructing the graph, I wanted to validate if my input.vcf
file successfully passed all the structural variations (SVs) to the graph. My approach was to use the vg deconstruct
command to generate a new VCF file called new.vcf
, and then compare the two VCF files to check if they contained the same set of SVs, or at least a similar set.
However, when I compared the SVs for the chromosome "NC_058080.1_1" between the original input.vcf
file and the new.vcf
file, I noticed a significant difference. The input.vcf
file contained 101,242 SVs for this chromosome, whereas the new.vcf
file only had 441 SVs for the same chromosome.
I'm unsure at which step I might have made a mistake. To provide a clearer picture, I will list all the commands I used. Hopefully, this information will help identify any potential errors or issues in the process.
vg version: v1.48.0 "Gallipoli"
input vcf: input.vcf
which contains unphased SVs
# graph construct
vg construct -f -S -a -t 1 -R NC_058080.1_1 -r ref.fna -v input.vcf.gz > NC_058080.1_1.graph_div_12bufo.vg
# deconstruct
vg deconstruct -t 16 --verbose -a NC_058080.1_1.graph_div_12bufo.vg > new.vcf
Looking forward to your help. Thanks in advance!
Maxine
Thank you for your response. I still have some questions regarding Giraffe. In my case, the
input.vcf
file is unphased and contains numerous structural variations (SVs), which means that overlaps between SVs are quite common. According to the guide Mapping short reads with Giraffe, using Giraffe for mapping might not be the most suitable approach. Now, my concern is whether this situation will impact the accuracy ofvg deconstruct
. If there could be potential issues, I'd like to know if there's a better way to test the quality and accuracy of thegraph.vg
file I constructed. Any guidance or suggestions would be greatly appreciated.