Question

GraphAligner/vg surject - alignment output nonsense?

1

Entering edit mode

18 months ago

Andrew ▴ 30

Following on from some advice I received in a previous post, I've attempted to use GraphAligner and vg surject to map long reads to the T2T reference genome, and it seems to be resulting in nonsense outputs - a much smaller BAM than a comparable linearly aligned BAM from minimap2 but with 16x more alignments (631.8 million versus 39.6 million, from a source fasta containing 29.8 million reads) and yet when I attempt to call SVs on it using sniffles2 and cuteSV I get zero results. This makes no sense, so I'm wondering what exactly is going on here?

I'm fairly certain it's not sniffles2 or cuteSV, they did successfully call variants when used on the output from minimap2 based on the same input data. I'm not sure what I can adjust on the GraphAligner or vg surject front, I'm using default settings (and in the case of the former, its presets for vg-style graph inputs). The one thing I know is going on is that vg surject is hitting a hard limit on alignment size on some subgraphs and stopping early for some reads - how many, I don't know, because it refuses to give further warnings past the first.

My pipeline (with the data sources and software versions listed) is here, and a sample error log from the GraphAligner and vg surject section of the process is here. I'd appreciate advice from anyone who has more experience with these tools - have a sneaking suspicion that I'm running into the same issue of "these tools can't quite do what you're asking of them" from before.

vg-toolkit graphaligner vg vg-surject vgteam • 1.4k views

ADD COMMENT • link updated 12 weeks ago by stringsAsBaxters • 0 • written 18 months ago by Andrew ▴ 30

1

Entering edit mode

One thing to check out is whether the GAM file from GraphAligner looks at all suspicious. To look at a few alignments you can use vg view -aj alns.gam | head

ADD REPLY • link 18 months ago by Jordan M Eizenga ▴ 640

0

Entering edit mode

Will do - I looked at a sampling of the BAM alignments wondering the same thing, and they looked similar enough to the ones from the linear BAM just on sight (then again, I wasn't entirely sure what would single an alignment out as faulty, so it's possible I missed something). Is there anything specific I should be looking for in the GAM?

Edit: nothing looks weird (well, I don't really know what "weird" is, but it's at least internally consistent from what I can tell) - I'm thinking there's something weird about the surject step and how it's moving from GAM to BAM. Checked the BAM headers out of curiosity and I'm pretty sure that's not where the problem is. Aside from some messiness from merging, they contain the same flags for chromosome name and length as the minimap2-generated BAM, as well as ID tags from vg.

Edit 2: also to clarify, the alignments in the BAM from this pipeline look similar to what I'm seeing in the linear BAM I made for comparison, but for some reason downstream tools aren't able to call anything on it.

ADD REPLY • link 18 months ago by Andrew ▴ 30

0

Entering edit mode

Did you ever solve this?

GraphAligner doesn't retain the base quality scores in the GAM output, see: https://github.com/maickrau/GraphAligner/issues/84

If the variant callers you're using require those scores, then maybe that's why they fail?

ADD REPLY • link 12 weeks ago by stringsAsBaxters • 0