Hi,
I am observing very different surjections for the same read in two versions of vg: 1.46 and 1.62. The behavior in 1.46 seems to me more logical, I would appreciate if someone knows the difference in surject
behavior between the two versions and how to recover the behavior of 1.46.
Specifically, I am aligning the following read:
@HC_chr1:38618538_0
GAGGTCAGGAGTTCGAGACCAGCCTGACCAACTGGAGAAACCCCGTCTCTACTAAAAATACAAAATTAGCCGGGCGTGATGGTACATGCCTGTAATCCCAGCTACTCGGGGGGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCGGAGGTTGCGGTGAGCTG
AAATCGCACCATTGCAC
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBB
in 1.46 I use the following command for mapping
vg giraffe \
--progress \
--sample sample \
--output-format gaf \
-f sample.fq \
-Z hprc-v1.0-mc-chm13-minaf.0.1.gbz \
-d hprc-v1.0-mc-chm13-minaf.0.1.dist \
-m hprc-v1.0-mc-chm13-minaf.0.1.min \
-t 1
in 1.62:
vg giraffe --progress --sample 030945_assembly_vg_realigned \
-f /241205/sample.fq \
--output-format gaf \
-Z /data/Runs/genomes/broad-references/hg38/v0/giraffe_index_1.1_af/hprc-v1.1-mc-grch38.d9.gbz \
-d /data/Runs/genomes/broad-references/hg38/v0/giraffe_index_1.1_af/hprc-v1.1-mc-grch38.d9.dist \
-m /data/Runs/genomes/broad-references/hg38/v0/giraffe_index_1.1_af/hprc-v1.1-mc-grch38.d9.min \
-t 1
The GAF files that I receive in the two cases are, as far as I understand, very similar:
1.46
HC_chr1:38618538_0 182 0 182 + >2118432>2118433>2118434>2118451 239 17 198 182 182 60 AS:i:192 bq:Z:BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB cs:Z::182 dv:f:0
1.62
HC_chr1:38618538_0 182 0 182 + >466858>466859>466860>466877 233 17 199 182 182 60 AS:i:192 bq:Z:BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB cs:Z::182 dv:f:0
However, when I am running vg surject
to project on hg38, I get different results:
1.46
HC_chr1:38618538_0 0 GRCh38.chr1 38618438 60 114M1604D68M * 0 0 GAGGTCAGGAGTTCGAGACCAGCCTGACCAACTGGAGAAACCCCGTCTCTACTAAAAATACAAAATTAGCCGGGCGTGATGGTACATGCCTGTAATCCCAGCTACTCGGGGGGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCGGAGGTTGCGGTGAGCTGAAATCGCACCATTGCAC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
1.62
HC_chr1:38618538_0 0 GRCh38#0#chr1 38618438 60 163M19S * 0 0 GAGGTCAGGAGTTCGAGACCAGCCTGACCAACTGGAGAAACCCCGTCTCTACTAAAAATACAAAATTAGCCGGGCGTGATGGTACATGCCTGTAATCCCAGCTACTCGGGGGGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCGGAGGTTGCGGTGAGCTGAAATCGCACCATTGCAC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB AS:i:158
I am using the same vg surject
command as far as I understand.
The first alignment makes more sense to me since the read spans a common deletion (1604D) in the population, whereas in 1.62 surjection is not able to recover it.
Thank you all for the help!
Ilya
```
You are also using different graphs with vg 1.46 and 1.62. The graph you use with vg 1.46 is a v1.0 graph built around CHM13, while the graph you use with 1.62 is a v1.1 graph built around GRCh38.
What happens if you surject the alignments from vg 1.46 using vg 1.62? And the other way around?
Thanks Jouni Sirén , that was a very informative one. Now I think I am convinced that this is just the difference between
vg surject
in the two versions ofvg
and the change occurred between versions 1.47 and 1.48.Here is the logic:
For consistency I aligned the same read
using v1.62 and a graph
hprc-v1.1-mc-chm13.d9.gbz
.with this command:
and performed
vg surject
with 1.62 and 1.46 with this command:With 1.46 I get the good alignment:
While with 1.62 I get the following less useful alignment:
I get the same behavior when aligning to the v1.0 graph. 1.46 produces good surjection 1.62 - not so much. Playing with versions, I see that the change occured between version 1.47 and 1.48.
Would be happy to hear suggestions since the old behavior was useful in some cases!
Thanks,
Ilya