Identify parent of each read in a GAF
0
0
Entering edit mode
15 months ago
cfourps ▴ 10

I have a .gfa created by running the fasta files of two genomes through the Cactus-Minigraph pipeline. I am aligning PacBio Hifi reads to that reference using GraphAligner. Column 6 of the resulting GAF file lists the segments for the alignment path of a given read.

I expect to see reads that come exclusively from either genome, and also 'recombined' reads. What is an efficient way to classify each read as coming Genome A or Genome B, or recombined/mixed? Relatedly, how do I get the list of segment IDs belonging to each genome so that I can use it to 'decode' the parent from the segment IDs in column 6 of the GAF file?

gaf vgteam vg • 623 views
ADD COMMENT
0
Entering edit mode

It's not exactly clean, but one option is vg paths -A with -p set to get the two input genomes individually as GAF, then vg pack -d on each genome path GAF to get a table of nodes that are included.

ADD REPLY

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6