vg view and vg surject both crashing after running giraffe
0
0
Entering edit mode
12 months ago
Jen • 0

Hello,

I have a genome graph I created in pggb. I indexed using

vg autoindex --workflow giraffe -g IN.gfa -t 127

Then I ran giraffe to align short reads:

./vg giraffe -p -Z index.giraffe.gbz -d index.dist -m index.min -f R1.fastq -f R2.fastq > IDB4.gam

This worked, although it did default back to single-end mapping. Here's an example output from vg stats -a IDB4.gam

Total alignments: 758
Total primary: 758
Total secondary: 0
Total aligned: 758
Total perfect: 608
Total gapless (softclips allowed): 755
Total paired: 758
Total properly paired: 520
Alignment score: mean 156.467, median 161, stdev 13.3414, max 161 (469 reads)
Mapping quality: mean 49.715, median 60, stdev 19.6857, max 60 (575 reads)
Insertions: 1 bp in 1 read events
Deletions: 4 bp in 3 read events
Substitutions: 328 bp in 328 read events
Softclips: 694 bp in 16 read events
Total time: 70.6228 seconds
Speed: 10.7331 reads/second

However, now I really need to go from this GAM back to my pggb pangenome. My first thought was to visualise and surject back onto one of my reference paths. However, neither of these functions worked.

./vg view -d -F cuc11-chr5.gfa -A vg_giraffe_alignments/IDB4.gam | dot -Tsvg -o aln.svg

the result was this error:

Error: <stdin>: syntax error in line 11987 scanning a HTML string
(missing '>'? bad nesting? longer than 16384?) String starting:<<TABLE
BORDER="0" CELLPADDING="0" CELLSPACING="0"><TR><TD PORT="nw"></TD><TD

When I tried running surject with:

./vg surject -x index.giraffe.gbz -p pathname -b vg_giraffe_alignments/IDB4.gam '>' IDB4_pathname.bam

The result was very slow, until eventually my instance crashed because I ran out of memory - However I'm running on a machine with 128 cores and 1.9 TB so the amount of memory used here seems excessive - is there a way to do this more efficiently? Alternatively, does anyone know of a way to go from a GAM to a BED, or a GAM to an odgi extraction?

Thanks very much!

surject view extract giraffe vg • 545 views
ADD COMMENT
1
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or use one of (a) the option highlighted in the image below/ (b) fenced code blocks for multi-line code. Fenced code blocks are useful in syntax highlighting. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

Please DO NOT use the double quote option - that is for quoting (citing) a source verbatim, not for formatting code. It mangles code content by removing existing new lines/inserting new lines where it deems appropriate.

ADD REPLY

Login before adding your answer.

Traffic: 2230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6