It looks like I need to set a max node length (something in the range of 256 - 1024) to be able to index my splice graph that includes intronic as well as exonic sequence. Indexing allows me to run mpmap and create a GAMP, and if I'm interested in viewing alignments on the graph, I can create a GAM by selecting, say, an isoform path, then view those alignments. The problem is that I now have 10s or 100s more nodes because the long introns have all been split up, so if I'm creating an SVG with dot piped from a view, it would be a reaaallly wide SVG. I think I could run vg mod to zip up the linear chains of nodes (haven't gotten that far), but that would make the GAMP invalid, right? Is there a better way to do all of this?
You are correct that this would invalidate the GAM. It's a tricky problem, and I actually don't think that compacting the nodes would help all that much. For long introns, the size in the dot output will still be pretty large even without the unnecessary edges. In addition, dot sometimes makes some pretty funky layout decisions on spliced graphs. In general, here hasn't been a lot of work on visualizing RNA-seq alignments for pangenomes. It might be easiest to use vg surject to make a BAM for IGV, although you could definitely incur some reference bias doing so. Another option would be to try to combine the haplotype transcript GBWT with a haplotype GBWT so that you could use SequenceTubeMap on the GAM, but you would be trailblazing a bit to do that.
I thought I could just select alignments from the GAMP in a filter command, outputting a single-path (mostly) GAMP, then convert to GAM. The commands below all run, but I'm getting no alignments in the final svg. I've got a splicing graph of the VTA1 gene created from a simple hand-edited GFA (just one sequence and a "chromosomal path" consisting of the genomic dna, then 'vg rna' run on coordinate-shifted annotations for the gene). Then:
This is my first foray into vg, so please let me know where I'm going wrong. When I first posted I thought mod was what I needed, but looking into the options, filter seemed like the way to go.
Also, I run into a problem when trying to introduce a fusion variant:
The error was from vg construct:
Are interchromosomal translocations not supported, or should I be specifying it a different way? I hand-coded it, so it's possible I've messed up the format. I don't have haplotypes at the moment ... is the GBWT route still feasible / appropriate?
Unfortunately, no, they are not supported. I believe the only symbolic alleles that VG supports are INS, DEL, and INV.
I'm not familiar with all of these functionalities in
jq
, but conceptually this script looks correct to me. Thevg view
steps could be pretty slow, depending how much data you have.