sequenceTubeMap input files
1
0
Entering edit mode
4 months ago
Mayank ▴ 10

Hello,

I am trying to find why I have to use files like .vg.gbwt and .vg.xg to feed sequenceTubeMap? Wouldn't it be more convenient to use human readable file types like json or txt?

Also, I want to know how I can produce these files when I have something human readable with me.

Thank you for your response!

sequencetubemap vg • 802 views
ADD COMMENT
3
Entering edit mode
4 months ago

It's mostly a question of efficiency. JSON representations of a eukaryotic pangenome graph are monstrously large. There is a human-readable, text-based format in GFA, which is frequently used as an interchange format between tools. However, naively loading GFA into memory leads to data structures that either are impractically large or don't support the basic queries that you need to compute on a graph. This is especially true for storing a large number of haplotype paths (which are necessary for sequenceTubeMap's visualizations)

ADD COMMENT
0
Entering edit mode

Lets say I need to visualize some kind of genomic data, which I currently have in a human readable format. How can I convert my files to these formats?

ADD REPLY
1
Entering edit mode

You would use the vg toolkit. The vg convert subcommand can convert a GFA to an XG index, and the vg gbwt subcommand can make GBWT indexes.

ADD REPLY
0
Entering edit mode

I was successfully able to read the xg format, but am still having issues with opening up my gbwt file in .gfa or any other readable formats. I suspect I need to edit this file to change the number of tracks in a plot. The menu that pops up when I type vg gbwt isnt very informative on how I can do so. Could you throw some light on this?

ADD REPLY
1
Entering edit mode

GBWT isn't convertible into a GFA on its own. The GBWT index contains only the haplotype paths, so it lacks the node sequences that are required to fully specify the graph. If you want to make a full graph with the GBWT, you should augment it into a GBZ, which is essentially a GBWT with node sequences added. You can do that in vg gbwt as well.

ADD REPLY
0
Entering edit mode

Noted, but how can I change the number of tracks keeping the .xg and .gbwt file separate?

ADD REPLY
0
Entering edit mode

Could you clarify what you mean by the "number of tracks"?

ADD REPLY
0
Entering edit mode

By tracks I mean the thick lines that represent each sequence that go through the various nodes. By being able to change 'number of tracks' I meant increase or decrease these lines while making necessary corresponding changes to the .xg file.

In short: I want to add another sequence (something I think can achieved by editing the gbwt file.)

ADD REPLY
0
Entering edit mode

GBWTs are not particularly easy to edit, but if you have two GBWT files over a graph with the same node IDs, you can combine them with vg gbwt --merge. You can also remove haplotypes with vg gbwt --remove-sample.

ADD REPLY
0
Entering edit mode

Thank you very much for your answers, I really appreciate it. Could you please refer me to some material which talks about gbwt, probably about their properties and generation.

ADD REPLY
1
Entering edit mode

This wiki article is probably the best resource. The academic papers on the GBWT are more focused on the underlying algorithmics. https://github.com/vgteam/vg/wiki/VG-GBWT-Subcommand

ADD REPLY

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6