Convert two Haplotypes into diploid genome assembly
1
0
Entering edit mode
7 weeks ago
Yiyi • 0

I need a genome of one plant species, which is a diploid. I downloaded two haplotype-resolved assemblies from a research paper (HIFi-C, chromosome-scale genome assembly). I wondered how to recover the diploid genome assembly. I noticed that the hifiasm may fulfill this. However, I am still confused after reading the manual. Moreover, I also need the gene annotation GFF file. Although the authors provide the GFF file for each haplotype, how can I retrieve the GFF file for the diploid genome? I assume the duplicates should be removed from the two GFF files, but how do we handle the variants (when the same location has different functions and annotations)?

Any suggestions?

Many thanks!

goneme assembly • 370 views
ADD COMMENT
2
Entering edit mode
6 weeks ago

Actually a haploid (collapsed) representation of a genome is what you seem to want, not a diploid (which has two haplotypes, which you already have).

Up until about 2022, almost all assembled genomes were haploid/diploid/polyplod but assemblied into a single haploid representation. Such as hg19 etc in the human world.

These days, with long reads and Hi-C, it is possible to assemble individual haplotypes. This is useful for detailed analysis but doubles the workload for analysis.

If your two haplotypes are somewhat close together and one is of a better quality than the other (check stats.sh from the bbmap package to get N50 values and contig distributions if contigs are available), then I would use that as the primary haplotype.

The problem is that bioinformatics is not really ready for analysis of true haplotype resolved/diploid genomes. Pangenomics is one way around this but is not productive/competitive with linear reference analysis yet and will not be for years to come IMHO.

Oh - and no, you can't just merge the haplotypes easily AFAIK.

ADD COMMENT
0
Entering edit mode

Yes, you are absolutely right. I realized that the genome from the regular assembly approach is a single haploid; actually, the assembly is a hybrid.

I appreciate your comments and suggestions! You made my day.

ADD REPLY

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6