Identify parent of each read in a GAF

0

Entering edit mode

15 months ago

cfourps ▴ 10

I have a .gfa created by running the fasta files of two genomes through the Cactus-Minigraph pipeline. I am aligning PacBio Hifi reads to that reference using GraphAligner. Column 6 of the resulting GAF file lists the segments for the alignment path of a given read.

I expect to see reads that come exclusively from either genome, and also 'recombined' reads. What is an efficient way to classify each read as coming Genome A or Genome B, or recombined/mixed? Relatedly, how do I get the list of segment IDs belonging to each genome so that I can use it to 'decode' the parent from the segment IDs in column 6 of the GAF file?

gaf vgteam vg • 623 views

ADD COMMENT • link updated 15 months ago by Jordan M Eizenga ▴ 660 • written 15 months ago by cfourps ▴ 10

0

Entering edit mode

It's not exactly clean, but one option is vg paths -A with -p set to get the two input genomes individually as GAF, then vg pack -d on each genome path GAF to get a table of nodes that are included.

ADD REPLY • link 15 months ago by Jordan M Eizenga ▴ 660

Login before adding your answer.