Hello everyone,
Do you know of any tool that allows you to calculate the heterozygosity of an assembly? The genome I have is from a diploid plant.
Hello everyone,
Do you know of any tool that allows you to calculate the heterozygosity of an assembly? The genome I have is from a diploid plant.
Do you have access to the raw sequencing data for the assembly? If so, you can map the raw reads to the reference genome, call variants, and then estimate the number of loci with heterozygote SNPs.
However, this value can be inflated by copy number variable loci. Though you could probably disentangle some of these loci through coverage maps and identify loci with more than 2 variants.
Hi,
It is difficult to give you an accurate answer without having more details about your assembly or the sequencing data you used to produce it.
The assembly is usually an haploid representation of the genome, so the concept of heterozygosity does not apply to it (unless it is a phased assembly).
If you want to estimate the level of heterozygosity of the genome itself, you can use a k-mer based approach on the sequencing reads, for example using jellyfish + GenomeScope (cf: http://qb.cshl.edu/genomescope/). This approach works best on reads with low error rates (usually short reads, or the recent, high-quality, long-reads).
Finally, it is not exactly what you asked, but if you want to assess the level of duplications in your assembly (usually resulting from heterozygous regions), then you can use tools such as BUSCO: https://busco.ezlab.org/.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.