Right method to annotate genes on pangenome graph
1
2
Entering edit mode
6 weeks ago
Quanyu ▴ 20

Hello dear friends!

Firstly I need to say I'm fresh to manipulating graphs due to the various formats (e.g. .vg, .xg, .gbz, .gam ...).

And now, as a junior, I do need some helps:

I have a human pangenome graph with several genomes with a reference genome_a. And I want to see the locations of some interested genes regions in my graph like the Fig. 5d in HPRC publication. Due to the high complexity of these regions like MHC, gene annotations are not reliable for which we can just draw the gene locations from annotations. Therefore, I turned to using graph to get locally detailed and confident gene annotations. At first, I have tried this method (actually this method is following the odgi tutorial):

  1. extract subgraphs with odgi
  2. get the interested gene bed file and inject them to graph
  3. odgi untangle the injected graph to see the locations of genes on each path

However, I found that for genes having CNV, this method seems often inable to capture all gene copies (actually usually just one copy), so I have turned to finding anther useful method. As for now, I intended to:

  1. align interested genes sequence like HLA genes which were extracted from GRCh38.p14 to graph using Graphaligner
  2. using the alignment generated by step 1 to get gene locations on each haplotype of my graph

For step 2, I initially used vg annotate but it seems only work for reference path (#4158). And I used vg surject using command:

vg paths -x graph.vg -L > graph.vg.paths
vg surject -x graph.vg -t 8 -F graph.vg.paths -M -b genes_sequence_To_graph.gam > genes_sequence_To_graph.bam

which have not got results as I write this.

Also from #4158, in which the developers suggested:

but if you have the GAF and you have the GFA you can compare the node names that the GAF reads visit against the node names that each GFA path visits, and find the nodes at which each read intersects with each path it touches.

and I think I can also use this, well stupid method, to get the gene locations from the gaf file Graphaligner generated.

Emmm, I don't know whether vg surject I used above can generate correct alignment file containing the gene locations on each path or not. So I want to know anybody can give me some advice for my process and method or any other helpful method. Please!

Best wishes! Thanks!

pangenome vg • 399 views
ADD COMMENT
1
Entering edit mode
5 weeks ago

I haven't found a good method to add gene annotations to a pangenome graph either. I think it is a missing feature.

I had a go at odgi inject in this gist. The vg part was commented out because it was less good in my opinion.

ADD COMMENT

Login before adding your answer.

Traffic: 729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6