How To Estimate Genome Size Using K-Mer Coverage
1
0
Entering edit mode
12.5 years ago
GAO Yang ▴ 250

Hi, I just got a genome assembly (de novo), and I want to estimate the genome size. According to some published papers, it can be done using the k-mer coverage. But I am not quite following this one: How to cut the genome to chosen K-mer ? And how to summary the K-mer abundance and plot a Poisson distribution, just like in the papers?

Could anybody provide any software name,or Perl module, or even some Pseudo-code? Thanks for your help!

coverage genome • 10.0k views
ADD COMMENT
0
Entering edit mode

can you mention the paper you are referring to...

ADD REPLY
0
Entering edit mode

Sure, for example "The genome of the domesticated apple" Nature genetics 2010, supplementary Page9

ADD REPLY
1
Entering edit mode

There is no mention of k-mer in supplementary, I believe you confused kmer with read. Otherwise you should read about the Lander-Waterman statistics. The only difficult part is the fitting of the poisson distribution mentioned in the article.

ADD REPLY
0
Entering edit mode

+1. Even though I wrote about K-mers, this article seems to have nothing about K-mers.

ADD REPLY
0
Entering edit mode

Sorry about that~ I mixed them up~ But plz check this one:"Genome sequencing reveals insights into physiology and longevity of the naked mole rat" supplementary P3,doi:10.1038/nature10533

ADD REPLY
0
Entering edit mode

In that case, you should know that the first link I have pointed to, which explains K-mer coverage related to genome size, is a tool (quake) to obtain all what you've asked for.

ADD REPLY
2
Entering edit mode
12.5 years ago
Arun 2.4k

Regarding your first question, about K-mer coverage and genome size, there seems to be different methods/algorithms different softwares use. EDIT: The idea in general is explained very well here. I don't follow what you mean by "how to cut the genome to chosen K-mer, could you please elaborate? To speculate about the K-mer distribution, it is done by obtaining the histogram/density plot by binning K-mers over different coverage. You'll see a smooth curve that resembles a poisson distribution. If there is bias in your k-mer distribution, you'll normally see an initial peak like this, from which you can decide the cut-off of coverage that you'll have to use to get rid of this bias.

ADD COMMENT
0
Entering edit mode

Yeah, This is what I need! Thanks for that, I am going on with it @_@

ADD REPLY
0
Entering edit mode

By the way, do you know how to apply this software on the Color-space reads (SOLiD output)? Maybe I need post another question about it ~ :)

ADD REPLY

Login before adding your answer.

Traffic: 2899 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6