Question

How To Choose The K Value Of Kmer In Soapdenovo?

9

Entering edit mode

12.4 years ago

Dejian ★ 1.3k

I have read the papers about EULER, velvet and soapdenovo, but I am still confused about how to choose the K values. It is a common practice to test several K values and choose the best one among them according to the results. But I think there may be some clues indicating the proper range of K and the K values should not be tested blindly. For example, obviously the K should be less than the maximal length of the reads. Is there a way to estimate roughtly the proper range of K values according to the genome size, sequenceing depth, reads length or something else? How do you choose the K value? Many thanks.

• 22k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 12.4 years ago by Dejian ★ 1.3k

Ram · Answer 1 · 2012-12-06

9

Entering edit mode

12.4 years ago

Rm 8.3k

Generally it is between half to 2/3rd of the read length; Too small will lead to many short contigs, whereas longer kmer will result in few long contigs.

ADD COMMENT • link 12.4 years ago by Rm 8.3k

3

Entering edit mode

And then, may need to perform several trial runs with different K-mer around, and select the best one

ADD REPLY • link 12.4 years ago by GAO Yang ▴ 250

1

Entering edit mode

How do I know which K-mer gives the better results? Thanks!

ADD REPLY • link 10.7 years ago by Anchittha.satjarak ▴ 10

2

Entering edit mode

After assembly, you will calculate some statistics such as contig N50 N90, scaffold N50 N90, and total scaffold length. Usually, a better Kmer gives larger contig/scaffold N50/90 values. But the total scaffold length should not deviate too much from the estimated genome size (You should estimate the genome size using an experimental method such as flow cytometry).

ADD REPLY • link 10.7 years ago by Dejian ★ 1.3k

1

Entering edit mode

Quite reasonable. Many thanks.

ADD REPLY • link 12.4 years ago by Dejian ★ 1.3k

0

Entering edit mode

I thought in a perfect experiment, we'd want a single contig that covers the whole genome. Why "longer kmer will result in few long contigs" is a bad thing?

ADD REPLY • link 9.9 years ago by scchess ▴ 640

0

Entering edit mode

Imperfect coverage and sequencing errors.. Sufficiently many error-free k-mers need to cover each position in a contig. Take a look at the kmergenie paper for a longer discussion.

ADD REPLY • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by Rayan Chikhi ★ 1.6k

0

Entering edit mode

Is it really true? Shorter kmers will unable overgo repetitions of the same or longer length, but on the other hand it help you to guild more dense graph (basically two reads will be in connected in graph only in the case of overlap of the size of kmer). Therefore I guess it depens a lot on coverage you have, lower coverage you have the smaller kmer you have to choose because otherwise even non complex regions wont be resolved.

ADD REPLY • link 9.2 years ago by kamiljaron ▴ 230

score 4 · Answer 2 · 2012-12-07

4

Entering edit mode

12.4 years ago

Frédéric Bigey ▴ 320

I suggest to have a look at VelvetOptimiser :

VelvetOptimiser is a multi-threaded Perl script for automatically optimising the three primary parameter options (K, -expcov, -covcutoff) for the Velvet de novo sequence assembler.

ADD COMMENT • link 12.4 years ago by Frédéric Bigey ▴ 320

0

Entering edit mode

Thanks for you advice, Frederic. SOAPdenovo shares basic rationale with Velvet. Your suggestion should be helpful for K value selection in soapdenovo. I will check it.

ADD REPLY • link 12.4 years ago by Dejian ★ 1.3k

Ram · Answer 3 · 2014-01-10

2

Entering edit mode

11.3 years ago

Hranjeev ★ 1.5k

You may estimate the best kmer using the kmer frequency table. One of the programs that do this specifically is kmergenie. I''ve used the tool earlier but the results were not too promising but perhaps the updates in the software may have improved things a bit.

Wished it had a k-mer best estimate for error correcting reads as well.

Pros: You get a k-mer that you can focus on for assembly (obviously).

Cons: The running of the program itself takes quite a while.

ADD COMMENT • link 11.3 years ago by Hranjeev ★ 1.5k

1

Entering edit mode

Hi, kmergenie dev here. Indeed, we're continuously improving the software, and it normally works well for our users. I encourage you to try a latest version and email me (kmergenie@cse.psu.edu) if you get unsatisfactory results. Your feedback can help us identify problems we were not aware of. Thanks.

ADD REPLY • link 11.0 years ago by Rayan Chikhi ★ 1.6k

0

Entering edit mode

Hi,

I get some error when using kmergenie, link of question is kmergenie [OSError: [Errno 2] No such file or directory]

Thanks!

ADD REPLY • link updated 3.4 years ago by Ram 45k • written 10.7 years ago by wukai199010 • 0

0

Entering edit mode

Does one combine forward and reverse reads into one file to run kmergenie? Is it possible to use separate files?

ADD REPLY • link 7.2 years ago by deepti1rao ▴ 60