Unexpected behavior of `vg surject`
0
0
Entering edit mode
5 weeks ago
Ilya • 0

Hi,

I am observing very different surjections for the same read in two versions of vg: 1.46 and 1.62. The behavior in 1.46 seems to me more logical, I would appreciate if someone knows the difference in surject behavior between the two versions and how to recover the behavior of 1.46.

Specifically, I am aligning the following read:

@HC_chr1:38618538_0
GAGGTCAGGAGTTCGAGACCAGCCTGACCAACTGGAGAAACCCCGTCTCTACTAAAAATACAAAATTAGCCGGGCGTGATGGTACATGCCTGTAATCCCAGCTACTCGGGGGGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCGGAGGTTGCGGTGAGCTG
AAATCGCACCATTGCAC
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBB

in 1.46 I use the following command for mapping

vg giraffe \
        --progress \
        --sample sample \
        --output-format gaf \
        -f sample.fq \
        -Z hprc-v1.0-mc-chm13-minaf.0.1.gbz \
        -d hprc-v1.0-mc-chm13-minaf.0.1.dist \
        -m hprc-v1.0-mc-chm13-minaf.0.1.min \
        -t 1

in 1.62:

vg giraffe --progress --sample 030945_assembly_vg_realigned \
-f /241205/sample.fq \
--output-format gaf \
-Z /data/Runs/genomes/broad-references/hg38/v0/giraffe_index_1.1_af/hprc-v1.1-mc-grch38.d9.gbz \
-d /data/Runs/genomes/broad-references/hg38/v0/giraffe_index_1.1_af/hprc-v1.1-mc-grch38.d9.dist \
-m /data/Runs/genomes/broad-references/hg38/v0/giraffe_index_1.1_af/hprc-v1.1-mc-grch38.d9.min \
-t 1

The GAF files that I receive in the two cases are, as far as I understand, very similar:

1.46
HC_chr1:38618538_0  182 0   182 +   >2118432>2118433>2118434>2118451    239 17  198 182 182 60  AS:i:192    bq:Z:BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB cs:Z::182   dv:f:0

1.62
HC_chr1:38618538_0  182 0   182 +   >466858>466859>466860>466877    233 17  199 182 182 60  AS:i:192    bq:Z:BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB cs:Z::182   dv:f:0

However, when I am running vg surject to project on hg38, I get different results:

1.46
HC_chr1:38618538_0  0   GRCh38.chr1 38618438    60  114M1604D68M    *   0   0   GAGGTCAGGAGTTCGAGACCAGCCTGACCAACTGGAGAAACCCCGTCTCTACTAAAAATACAAAATTAGCCGGGCGTGATGGTACATGCCTGTAATCCCAGCTACTCGGGGGGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCGGAGGTTGCGGTGAGCTGAAATCGCACCATTGCAC  BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

1.62
HC_chr1:38618538_0  0   GRCh38#0#chr1   38618438    60  163M19S *   0   0   GAGGTCAGGAGTTCGAGACCAGCCTGACCAACTGGAGAAACCCCGTCTCTACTAAAAATACAAAATTAGCCGGGCGTGATGGTACATGCCTGTAATCCCAGCTACTCGGGGGGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCGGAGGTTGCGGTGAGCTGAAATCGCACCATTGCAC  BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB  AS:i:158

I am using the same vg surject command as far as I understand.

The first alignment makes more sense to me since the read spans a common deletion (1604D) in the population, whereas in 1.62 surjection is not able to recover it.

Thank you all for the help!

Ilya

```

surject vg • 279 views
ADD COMMENT
0
Entering edit mode

You are also using different graphs with vg 1.46 and 1.62. The graph you use with vg 1.46 is a v1.0 graph built around CHM13, while the graph you use with 1.62 is a v1.1 graph built around GRCh38.

What happens if you surject the alignments from vg 1.46 using vg 1.62? And the other way around?

ADD REPLY
0
Entering edit mode

Thanks Jouni Sirén , that was a very informative one. Now I think I am convinced that this is just the difference between vg surject in the two versions of vg and the change occurred between versions 1.47 and 1.48.

Here is the logic:

For consistency I aligned the same read

@HC_chr1:38618538_0
GAGGTCAGGAGTTCGAGACCAGCCTGACCAACTGGAGAAACCCCGTCTCTACTAAAAATACAAAATTAGCCGGGCGTGATGGTACATGCCTGTAATCCCAGCTACTCGGGGGGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCGGAGGTTGCGGTGAGCTG
AAATCGCACCATTGCAC
+
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBBB

using v1.62 and a graph hprc-v1.1-mc-chm13.d9.gbz.

with this command:

vg giraffe --progress --sample 030945_assembly_vg_realigned \
-f /241205/sample.fq \
--output-format gaf \
-Z hprc-v1.1-mc-chm13.d9.gbz \
-d hprc-v1.1-mc-chm13.d9.dist \
-m hprc-v1.1-mc-chm13.d9.min \
-t 1

and performed vg surject with 1.62 and 1.46 with this command:

vg surject \
    -x hprc-v1.1-mc-chm13.d9.gbz \
    -F /241205/paths_list.chm13.d9.txt \
    -s /241205/sample.1.1.chm13.gaf -G

With 1.46 I get the good alignment:

HC_chr1:38618538_0  0   GRCh38#0#chr1   38618438    60  126M1604D56M    *   0   0   GAGGTCAGGAGTTCGAGACCAGCCTGACCAACTGGAGAAACCCCGTCTCTACTAAAAATACAAAATTAGCCGGGCGTGATGGTACATGCCTGTAATCCCAGCTACTCGGGGGGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCGGAGGTTGCGGTGAGCTGAAATCGCACCATTGCAC  BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

While with 1.62 I get the following less useful alignment:

HC_chr1:38618538_0  0   GRCh38#0#chr1   38618438    60  163M19S *   0   0   GAGGTCAGGAGTTCGAGACCAGCCTGACCAACTGGAGAAACCCCGTCTCTACTAAAAATACAAAATTAGCCGGGCGTGATGGTACATGCCTGTAATCCCAGCTACTCGGGGGGGCTGAGGCAGGAGAATCACTTGAACCTGGGAGGCGGAGGTTGCGGTGAGCTGAAATCGCACCATTGCAC  BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB  AS:i:158

I get the same behavior when aligning to the v1.0 graph. 1.46 produces good surjection 1.62 - not so much. Playing with versions, I see that the change occured between version 1.47 and 1.48.

Would be happy to hear suggestions since the old behavior was useful in some cases!

Thanks,

Ilya

ADD REPLY

Login before adding your answer.

Traffic: 2416 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6