Entering edit mode
9.8 years ago
Prakki Rama
★
2.7k
Hi all,
How do we know, which peak is homozygous and heterozygous when we generate a kmer plot for estimating genome size? Would be thankful to your directions.
Thank you. But what about other small peaks appear in the plot after homozygous regions? They must be repetitive regions with higher coverage? Am I right?
Yes, additional peaks after the C2-peak (diploid genome peak) represent regions with higher copy number such as repeats. However, for forming a peak, you need a larger region or many sequences of very similar copy numbers.
Repeats usually don't form a peak, as each repeat is small and different repeats have different copy numbers.
But for example, I've got a plot from a small genome with high gene content, with a small distinct peak at C4. This peak comprises duplicated gene families. Also mitochondrium and chloroplast produce their own peak at their respective coverage (Often 100-10000 times the genome coverage). Partial genome duplications or chromosome aberrations can produce additional distinct peaks as well. And also bacterial contaminations, symbionts and parasites might produce peaks.
You can estimate the "size" of a peak to get an idea of what it represents. Simply sum up the count*coverage of kmers in the peak region.