Extract linear representation of paths in VG graph
1
0
Entering edit mode
4.7 years ago
Ian Fiddes ▴ 70

I am trying to use VG effectively as a genome compression tool for collections of highly related genomes. A key component of this is that I need to be able to randomly access subregions of an input genome, or path. How can this be accomplished with VG?

So far, as a test case, I have built a graph out of a few sequences, converted it to a sorted XG format file, and have found that I can use vg find -p to select a subset of the graph, but how can I convert this back to a linear sequence?

vg • 1.6k views
ADD COMMENT
2
Entering edit mode
4.7 years ago
Jouni Sirén ▴ 470

You can use vg paths -F to extract entire paths in FASTA format. By default, this extracts all paths in the graph. You can use option -p FILE to specify a list of path names in a file and option -Q PREFIX to extract all paths with the given name prefix.

In order to extract subpaths, you can combine vg paths with vg find. An example:

vg find -p 22:30000000-30100000 -x chr22.xg | vg paths -v - -F > output.fa

Option -v tells vg paths that the input is a vg file, and filename - means stdin.

ADD COMMENT

Login before adding your answer.

Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6