vg mod changes a graph, but what about a gam or gamp to the original graph?
0
0
Entering edit mode
5 weeks ago
joseph.fass ▴ 10

It looks like I need to set a max node length (something in the range of 256 - 1024) to be able to index my splice graph that includes intronic as well as exonic sequence. Indexing allows me to run mpmap and create a GAMP, and if I'm interested in viewing alignments on the graph, I can create a GAM by selecting, say, an isoform path, then view those alignments. The problem is that I now have 10s or 100s more nodes because the long introns have all been split up, so if I'm creating an SVG with dot piped from a view, it would be a reaaallly wide SVG. I think I could run vg mod to zip up the linear chains of nodes (haven't gotten that far), but that would make the GAMP invalid, right? Is there a better way to do all of this?

vg • 500 views
ADD COMMENT
0
Entering edit mode

You are correct that this would invalidate the GAM. It's a tricky problem, and I actually don't think that compacting the nodes would help all that much. For long introns, the size in the dot output will still be pretty large even without the unnecessary edges. In addition, dot sometimes makes some pretty funky layout decisions on spliced graphs. In general, here hasn't been a lot of work on visualizing RNA-seq alignments for pangenomes. It might be easiest to use vg surject to make a BAM for IGV, although you could definitely incur some reference bias doing so. Another option would be to try to combine the haplotype transcript GBWT with a haplotype GBWT so that you could use SequenceTubeMap on the GAM, but you would be trailblazing a bit to do that.

ADD REPLY
0
Entering edit mode

I thought I could just select alignments from the GAMP in a filter command, outputting a single-path (mostly) GAMP, then convert to GAM. The commands below all run, but I'm getting no alignments in the final svg. I've got a splicing graph of the VTA1 gene created from a simple hand-edited GFA (just one sequence and a "chromosomal path" consisting of the genomic dna, then 'vg rna' run on coordinate-shifted annotations for the gene). Then:

# align to the full graph; this is what required the max node length
vg mpmap --nt-type RNA --graph-name spliced.xg --dist-name spliced.dist --gcsa-name spliced.gcsa -f r1.fq.gz -f r2.fq.gz > aln.gamp

# grab a list of node ids for one path ... one of the isoforms
vg find --xg-name spliced.xg --path VTA1-201_R1 | vg view --json - | jq .node - | grep id | cut -f2 -d: | cut -f2 -d\" | sort -nu > temp.node_list

# query all the read ids that map to those nodes
cat temp.node_list | while read node_id; do
  cat temp.json | jq --slurp \
    | jq --arg nid $node_id '.[] | select(.subpath[]?.path.mapping[].position.node_id == $nid) | .name' \
    >> temp.read_ids
done
cat temp.read_ids | tr -d \" | sort -u > temp; mv temp temp.read_ids

# filter down the GAMP to only contain reads that align to nodes in the path ... though this could spill over to other nodes I'd guess
vg filter --input-mp-alns --exact-name --name-prefixes temp.read_ids --xg-name spliced.xg trna.gamp > trna.VTA1-201_R1.gamp                                                                   
# convert GAMP to GAM
vg view --multipath-in --gam trna.VTA1-201_R1.gamp \
  | vg view --dot --simple-dot --show-paths --ascii-labels spliced.xg --aln-graph - \
  | dot -T svg -o spliced.VTA1-201_R1.alns.svg

This is my first foray into vg, so please let me know where I'm going wrong. When I first posted I thought mod was what I needed, but looking into the options, filter seemed like the way to go.

ADD REPLY
0
Entering edit mode

Also, I run into a problem when trying to introduce a fusion variant:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
esr1    287818  esr1__gdna      G       G[vta1:51278[   100     PASS    SVTYPE=BND

The error was from vg construct:

vg construct --handle-sv --reference esr1__vta1.fa --vcf esr1__vta1.vcf > new.vg
index file esr1__vta1.fa.fai not found, generating...
warning:[vg::Constructor] vcflib could not canonicalize some SVs to base-level sequence; skipping variants like: esr1   287818  esr1__gdna      G       G[vta1:51278[   100     PASS    SVTYPE=BND

Are interchromosomal translocations not supported, or should I be specifying it a different way? I hand-coded it, so it's possible I've messed up the format. I don't have haplotypes at the moment ... is the GBWT route still feasible / appropriate?

ADD REPLY
0
Entering edit mode

Unfortunately, no, they are not supported. I believe the only symbolic alleles that VG supports are INS, DEL, and INV.

I'm not familiar with all of these functionalities in jq, but conceptually this script looks correct to me. The vg view steps could be pretty slow, depending how much data you have.

ADD REPLY

Login before adding your answer.

Traffic: 4615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6