Issues with vg surject into paths
1
0
Entering edit mode
7 days ago
Rugare • 0

Hello, I essentially am aiming to calculate the mean mapping qualities of patient FASTQs to different ethnicity genomes in the pangenome. HG1109 is one of the genomes in hprc-v1.1-mc-grch38.full.gbz and I'm using that to test my code. SRR32568286 is a random human FASTQ (tp53 amplicon) from SRA that I'm also using to test code.

I got HG1109 paths with

vg paths -x /path/to/hprc-v1.1-mc-grch38.full.gbz -L > /path/to/hprc-v1.1-mc-grch38.full.paths 
grep HG1109  /path/to/hprc-v1.1-mc-grch38.full.paths > HG1109.paths

Next, I have generated a GAM file with the following command:

vg giraffe -Z /path/to/hprc-v1.1-mc-grch38.full.gbz -f path/to/SRR32568286.fastq.gz -t 128 --progress -o gam > SRR32568286.gam

Next I try to surject this GAM into a BAM file with this command.

vg surject --progress -b -t 64 -x /path/to/hprc-v1.1-mc-grch38.full.gbz -F /path/to/HG1109.paths /path/to/SRR32568286.gam > /path/to/SRR32568286.bam

This gives me the following error (non ASCII characters removed for Biostars):

Loading graph...
Applying overlay...
Finding paths...
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
---------------------------
Crash report for vg v1.63.1 "Boccaleone"
Caught signal 6 raised at address 0x22bce6c; tracing with backward-cpp
Stack trace (most recent call last):
#17   Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x65e4f4, in _start
#16   Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x2279136, in __libc_start_main
#15   Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x2277899, in __libc_start_call_main
#14   Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0xf2d5fb, in vg::subcommand::Subcommand::operator()(int, char**) const
#13   Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0xf31e35, in main_surject(int, char**)
#12   Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x11fc812, in vg::get_sequence_dictionary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, handlegraph::PathPositionHandleGraph const&)
#11   Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x15e30f7, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void (std::istream&)>)
#10   Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x11f8b02, in vg::get_sequence_dictionary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, handlegraph::PathPositionHandleGraph const&)::{lambda(std::istream&)#2}::operator()(std::istream&) const
#9    Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x11f804b, in vg::get_sequence_dictionary(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, handlegraph::PathPositionHandleGraph const&)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*)#1}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const*) const
#8    Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x19dcb33, in bdsg::ReferencePathOverlay::get_path_length(handlegraph::path_handle_t const&) const
#7    Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x61d0d3, in std::__throw_out_of_range(char const*)
#6    Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x21b16b8, in __cxa_throw
#5    Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x21b1556, in std::terminate()
#4    Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x21b14eb, in __cxxabiv1::__terminate(void (*)())
#3    Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x61b3eb, in __gnu_cxx::__verbose_terminate_handler() [clone .cold]
#2    Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x61db33, in abort
#1    Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x2290315, in raise
#0    Object "/cephfs/volumes/hpc_home/k2478847/7eea5dc7-543f-4530-95f5-2fe4dc97b0e6/miniconda3/bin/vg", at 0x22bce6c, in __pthread_kill
Library locations:
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
---------------------------
Context dump:
        Thread 0: Starting 'surject' subcommand
Found 1 threads with context.
---------------------------
Please include this entire error log in your bug report!
---------------------------

Anyone able to advise on this? Or some other way to achieve my aim with vg tools? ps. I get the same error when I try vg giraffe with -o BAM, and all command run fine if I exclude --ref-paths

Thank you in advance.

vg • 397 views
ADD COMMENT
2
Entering edit mode
6 days ago
Jouni Sirén ▴ 630

I guess vg surject crashes because you are trying to project the alignments to non-reference paths. GBZ graphs do not expose haplotype paths through older interfaces (such as the for_each_path_handle() used when building a ReferencePathOverlay). Code that uses those interfaces does not expect a large number of haplotypes (e.g. thousands) and may not be able to handle them properly.

Changing reference samples in a GBZ graph is simple:

vg gbwt --set-reference HG1109 -g output.gbz --gbz-format -Z input.gbz

This may change path names, so you'll have to rebuild the path list.

You can also replace the grep command with option -S HG1109 in the vg paths command.

ADD COMMENT
0
Entering edit mode

That's very helpful, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6