Hi, I just got a genome assembly (de novo), and I want to estimate the genome size. According to some published papers, it can be done using the k-mer coverage. But I am not quite following this one: How to cut the genome to chosen K-mer ? And how to summary the K-mer abundance and plot a Poisson distribution, just like in the papers?
Could anybody provide any software name,or Perl module, or even some Pseudo-code? Thanks for your help!
can you mention the paper you are referring to...
Sure, for example "The genome of the domesticated apple" Nature genetics 2010, supplementary Page9
There is no mention of k-mer in supplementary, I believe you confused kmer with read. Otherwise you should read about the Lander-Waterman statistics. The only difficult part is the fitting of the poisson distribution mentioned in the article.
+1. Even though I wrote about K-mers, this article seems to have nothing about K-mers.
Sorry about that~ I mixed them up~ But plz check this one:"Genome sequencing reveals insights into physiology and longevity of the naked mole rat" supplementary P3,doi:10.1038/nature10533
In that case, you should know that the first link I have pointed to, which explains K-mer coverage related to genome size, is a tool (quake) to obtain all what you've asked for.