vg index memory allocation
2
0
Entering edit mode
3.9 years ago
kingcohn ▴ 30

I'm attempting to call variants using a reference graph I've generated first with minigraph ./minigraph -xggs -t16 f1.fa f2.fa f3.fa f4.fa f5.fa > PG_1.gfa and then vg./vg convert -g -a PG_1.gfa > PG_2.vg.

When I try to index, I'm prompted to modify the length of the nodes to 256 ./vg mod -X 256 PG_2.vg > PG_3.vg but it stalls out when I use this to index ./vg index -x -p PG_3.xg -g PG_3.gcsa PG_3.vg

exiting at

Building XG index Saving XG index to Ldec_vg.xg Generating kmer files... Building the GCSA2 index... InputGraph::InputGraph(): 2193921420 kmers in 1 file(s)

`

The PG_3.vg input is 1.3G in size, how much memory and CPUs should I be using to generate an index and call variants using fastq reads approximately 1.5-2GB in size?

vg reference graph • 1.3k views
ADD COMMENT
1
Entering edit mode
3.9 years ago
Jouni Sirén ▴ 450

Assuming that the graph is not too complex locally (in a 256 bp window), ~2 billion initial kmers in a single graph file should require 100-200 GB memory and 200-300 GB disk space in $TMPDIR.

GCSA construction uses a semi-external algorithm that works best when the graph is partitioned (e.g. by chromosome) into multiple .vg files. It can then reduce the memory usage significantly by loading kmers from one graph file at a time.

ADD COMMENT

Login before adding your answer.

Traffic: 2063 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6