Hello,
I have a diploïd yeast genome. There is a gene cassette, inserted in a heterozygous way, confirmed by PCR in this sample.
The sample was sequencend by CCS Pacbio technology. I did a diploïd assembly with hifiasm of this sample, which gave me a contiguous assembly (almost 1 contig per chromosome).
What I want to verify is if this cassette is found on only one of the two haplotypes. I have the cassette sequence (about 3kb), that I have aligned on this assembly. I found it on the two haplotypes, whereas I expect it on only one (heterozygous) : hifiasm collapsed the herezygous variation then.
Is there a way to detect this heterozygous variant from the reads? If I use the S288C reference genome (which doesn't have the cassette) :
- BLAST or align the sequence cassette to the Pacbio reads + align the Pacbio reads to the S288C genome.
- Then, check with IGV at the S288C position where are my reads which have the cassette. And determine a proportion ~50/50% : reads with the insertion compared to S288C genome, and the others without.
Any help? Best
You were able to assemble completely the two haplotypes seperately because the genome is highly heterozygous? You would therefore definitely expect that the cassette insertion would also be phased... Can you in some way verify your phasing? Can you try another phased assembly method?
Otherwise if you are not concerned about the phasing really, just align the reads to your assembly then look at the cassette region for reads that align to either side of the cassette. You should also see a relative drop in coverage. You could also extract all the reads that align in the region and perform a local assembly...but it is still strange that the cassette was not phased in the first place. But there is no real need to use the reference
Sorry for my late answer. I was meaning my phased assembly looked contiguous because there are few contigs and size closed to the expected genome. I wanted to verify the phasing with the cassette, normally found on one of the two haplotypes : but it was found on both.
I also tried a phased assembly with
flye
+hapdup
: same results and less contiguous.Then, what I did :
I look at these 45 reads on the alignement, and they are all in the range
pos-707,479 - pos-729,984
" . I blast the cassette to them and there was not any hit. Can we consider these 45 reads can confirm the heterozygous insertion? 97 reads with the cassette, 45 reads with not.I can share an IGV screenshot at the position. We can spot some heterozygous variant (some SNPs, or the deletion on the right), but not a drop coverage..
I also did a local assembly with all the reads around the 724,000pn, but
hifiasm
gave me exactly the same haplotypes.