Question

Variant calling analysis on phased assembly

0

Entering edit mode

24 months ago

pablo ▴ 330

Hi,

I have yeast assemblies : phased assemblies from hifiasm and "haploïd" assemblies from ipa assembler for the same samples. Both method gives good metrics : about 12.5Mb size assembly and < 30 contigs. For information, it is diploïd samples.

I would like to : do some variant calling first, and then, detect if there are some LOH events between my sample generations. I need to use one of my sample as reference genome to align my reads, because it is the "generation 0" sample.

What I did :

assembly (both hifiasm and ipa)
align my reads using pbmm2 to the reference
variant calling using pbsv

Now, should I only the "diploïd" assembly as reference? I think it could better because :

if I use the "haploïd" assembly : one heterozygous variant present in my reads should be recovered in one of my haploïd copy assembly. Then, during my reads alignment step, the same het variant from other samples will be detected in any case : for example, for a deletion ; detected once as a deletion and once as an insertion. But for this same homozygous variant (LOH event), I will be able to detect it only if the retained haploïd copy does not contain the variant.
it is always the main problem, using as reference genome one copy of a polyploïd genome?
any LOH events could be recovered suggesting het variants are correctly separated in my phased reference assembly?

Any suggestion?

Best

variant-calling bam pbsv • 1.1k views

ADD COMMENT • link updated 24 months ago by Ram 45k • written 24 months ago by pablo ▴ 330

score 0 · Answer 1 · 2023-06-13

This isn't easy. I don't think the world of bioinformatics has a plan at present for performant (small) variant calling on multiple diploid references.

Maybe the toolset which currently comes closest is PGGB, which is intended for pangenomes. You get an odgi pangenome, and VCF with variant calls out of your pangenome (created from multiple fastas). Carefully consider naming of your haplotypes though before starting.

Minigraph or odgi pav might also be useful if you are only interested in larger variations like SVs.