I have a .gfa created by running the fasta files of two genomes through the Cactus-Minigraph pipeline. I am aligning PacBio Hifi reads to that reference using GraphAligner. Column 6 of the resulting GAF file lists the segments for the alignment path of a given read.
I expect to see reads that come exclusively from either genome, and also 'recombined' reads. What is an efficient way to classify each read as coming Genome A or Genome B, or recombined/mixed? Relatedly, how do I get the list of segment IDs belonging to each genome so that I can use it to 'decode' the parent from the segment IDs in column 6 of the GAF file?
It's not exactly clean, but one option is
vg paths -A
with-p
set to get the two input genomes individually as GAF, thenvg pack -d
on each genome path GAF to get a table of nodes that are included.