I have a GFA graph built with PGGB using several samples. I want to genotype some other samples with short reads using VG Giraffe.
After investigating how to generate the corresponding indexes for VG Giraffe from a GFA generated with PGGB, I think I have found a way to do it. However, the node IDs obtained after VG call do not correspond to the node IDs of the original GFA. Is there any alternative way to the one I did to solve this problem?
Here how I created the indexes and how I used VG giraffe and VG call:
vg autoindex --workflow giraffe -p prefix -g pggb.gfa -R XG
vg giraffe -Z pggb.giraffe.gbz -x pggb.xg -f reads1_1.fastq -f reads1_2.fastq > graph.gam
vg pack -x pggb.xg -g graph.gam -o graph.pack
vg call pggb.xg -k graph.pack -p reference_path > graph.vcf
Here the vcf generated after VG Call:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
chr1 119 >4>6 AG A 319.823 PASS AT=>4>5>6,>4>6;DP=14 GT:DP:AD:GL:GQ:GP:XD:MAD 1/1:14:0,14:-33.191784,-5.222199,-1.686644:35:-1.098904:14.409575:14
chr1 3881 >123>126 T A 731.864 PASS AT=>123>124>126,>123>125>126;DP=32 GT:DP:AD:GL:GQ:GP:XD:MAD 1/1:32:0,32:-74.786923,-10.856443,-2.077491:87:-1.098612:31.617285:32
chr1 9995 >317>320 A T 1098.81 PASS AT=>317>318>320,>317>319>320;DP=48 GT:DP:AD:GL:GQ:GP:XD:MAD 1/1:48:0,48:-111.791357,-15.895637,-2.387109:135:-1.098612:43.424244:48
And here the vcf generated with VG deconstruct from the gfa graph (note that the positions are the same but the node IDs are different).
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
Chr 1119 >1>3 AG A 60 . AT=>1>2>3,>1>3
Chr1 3881 >3>6 T A 60 . AT=>3>4>6,>3>5>6
Chr1 9995 >6>9 A T 60 . AT=>6>7>9,>6>8>9
Thanks in advance
I'll also add that you should not be using the
-R
option invg autoindex
. There's a reason that this option is not included in the help output forvg autoindex
. We don't maintain stable behavior for it, and the results can be unpredictable unless you are familiar with the inner workings ofvg autoindex
. It's really more of a developer tool. If you want an XG file that matches your GBZ, you should usevg convert -x
. In addition, most VG commands whose help documentation saysxg
actually accept other graph formats (including GBZ) as well. The "XG" language is mostly legacy from when the formats were more rigidly enforced.Thanks a lot for your help. In fact, the main problem is that when converting the GFA file from pggb to GBZ format with
vg autoindex
, I lose the reference paths that I had in the GFA. That's why I included the XG file to try to solve the problem.That is, when I type
vg paths -Lx graph.gfa
, I get something like this, that correspond to my original assemblies:But when I type
vg paths -Lx graph.gbz
, I get this output that I can't understand:There is any way to keep the reference paths when converting from GFA to GBZ?