Question

Extract linear representation of paths in VG graph

0

Entering edit mode

5.4 years ago

Ian Fiddes ▴ 70

I am trying to use VG effectively as a genome compression tool for collections of highly related genomes. A key component of this is that I need to be able to randomly access subregions of an input genome, or path. How can this be accomplished with VG?

So far, as a test case, I have built a graph out of a few sequences, converted it to a sorted XG format file, and have found that I can use vg find -p to select a subset of the graph, but how can I convert this back to a linear sequence?

vg • 2.0k views

ADD COMMENT • link updated 5.4 years ago by Jouni Sirén ▴ 750 • written 5.4 years ago by Ian Fiddes ▴ 70

score 2 · Accepted Answer · 2020-03-19

You can use vg paths -F to extract entire paths in FASTA format. By default, this extracts all paths in the graph. You can use option -p FILE to specify a list of path names in a file and option -Q PREFIX to extract all paths with the given name prefix.

In order to extract subpaths, you can combine vg paths with vg find. An example:

vg find -p 22:30000000-30100000 -x chr22.xg | vg paths -v - -F > output.fa

Option -v tells vg paths that the input is a vg file, and filename - means stdin.