Question

Question about variant calling method using pangenome-graph

0

Entering edit mode

8 months ago

Marion • 0

Hello,

I have a question regarding the methodology for comparing the number of SNP called using giraffe on a pangenome-graph and BWA-MEM2 on a linear reference.

I read in publications two different methods.

One converts alignment in .gam to .bam using vg surject, then proceeds with a regular variant calling pipeline with the linear reference used as a backbone to construct the pangenome-graph. I saw this used in several papers, like here or here.

I also saw a second method done here where authors used vg augment from the alignments, followed by vg pack, vg snarl and finally vg call.

Is there a particular method that you would recommend for doing that?

I wish you a nice day, Regards, Marion

vg • 1.5k views

ADD COMMENT • link updated 8 months ago by Jouni Sirén ▴ 710 • written 8 months ago by Marion • 0

score 1 · Answer 1 · 2024-11-06

1

Entering edit mode

8 months ago

zhang yi xing ▴ 50

Hi, dear, I believe you can find the answer to these questions here: vg call vs vg surject

ADD COMMENT • link 8 months ago by zhang yi xing ▴ 50

0

Entering edit mode

Hi,

Thank you for your answer.

My issue has a bit evolved since. I have done the surject method, which led to a ~20% decrease in reads aligned in the resulting .bam file. Consequently, I have way less variant called than if I just use a regular linear reference with the same downstream variant calling method (GATK in my case).

I am now trying to see how I could improve that and if other methods for variant calling on pangenome-graph could be applied to divergent species.

ADD REPLY • link 8 months ago by Marion • 0

0

Entering edit mode

The variant calling method you want does not exist yet.

If you are using short reads, the approach using vg surject works best with graphs without too large structural variants. Otherwise many reads will map to locations that are nowhere near the reference sequence. Those alignments cannot be projected to the reference, and vg surject will drop them.

vg call works best for genotyping variants already present in the graph. You can try using it to call novel variants with the vg augment approach, but that introduces a lot of noise from sequencing errors and unnormalized edits, and vg call does not handle the noise very well.

What you want is closer to genome inference than variant calling. You would need a variant caller that works directly with the pangenome graph. After calling variants relative to the graph, you would infer the most likely haplotype paths in the graph and then use the graph to get the alignment between those paths and the paths corresponding to the reference genome.

ADD REPLY • link 8 months ago by Jouni Sirén ▴ 710