Hi,
I have yeast assemblies : phased assemblies from hifiasm
and "haploïd" assemblies from ipa
assembler for the same samples. Both method gives good metrics : about 12.5Mb size assembly and < 30 contigs. For information, it is diploïd samples.
I would like to : do some variant calling first, and then, detect if there are some LOH events between my sample generations. I need to use one of my sample as reference genome to align my reads, because it is the "generation 0" sample.
What I did :
- assembly (both
hifiasm
andipa
) - align my reads using
pbmm2
to the reference - variant calling using
pbsv
Now, should I only the "diploïd" assembly as reference? I think it could better because :
- if I use the "haploïd" assembly : one heterozygous variant present in my reads should be recovered in one of my haploïd copy assembly. Then, during my reads alignment step, the same het variant from other samples will be detected in any case : for example, for a deletion ; detected once as a deletion and once as an insertion. But for this same homozygous variant (LOH event), I will be able to detect it only if the retained haploïd copy does not contain the variant.
- it is always the main problem, using as reference genome one copy of a polyploïd genome?
- any LOH events could be recovered suggesting het variants are correctly separated in my phased reference assembly?
Any suggestion?
Best