Question

kmergenie does not show all k-mers in diploid mode

0

Entering edit mode

9.9 years ago

robin.vanvelzen.wur • 0

Dear all,

I am trying out kmergenie to determine optimal kmer values for plant genome assembly. Using the default settings I get a nice histogram for all the different k-mers, but using the --diploid parameter the histogram is truncated. See .dat outputs below.

It seems that many of the k-mer histograms do not have any associated model fits (this is apparent in the html output (not shown here). Do you know what may be going wrong?

Many thanks for any advice!

Robin

## Default (haploid) model
$ kmergenie filelist.txt -k 85 -t 8 -o kmergenie:

k genomic.kmers cov.cutoff
15 135570989 1
25 373492340 1
35 425306591 1
45 460648430 1
55 480548886 1
59 487082292 1
61 486570123 1
63 486719928 1
65 488075925 1
67 485561924 1
69 484863969 1
71 483620710 1
75 468859001 1
85 1932760 22

## Diploid model
$ kmergenie filelist.txt --diploid -k 85 -t 8 -o kmergeniediploid #note that estimates for k15, k45 and k>57 are missing

k genomic.kmers cov.cutoff
25 349086825 1
35 390717507 1
51 425391521 1
53 425679437 1
55 426942015 1
57 426544332 1

kmergenie • 3.0k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by robin.vanvelzen.wur • 0

0

Entering edit mode

Can you please send both HTML reports to kmergenie@cse.psu.edu?

The diploid model is more constrained, so it has higher chance to not fit to an histogram, as opposed to the haploid model, that is less constrained.

ADD REPLY • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by Rayan Chikhi ★ 1.6k

Ram · Answer 1 · 2015-05-26

2

Entering edit mode

9.9 years ago

Rayan Chikhi ★ 1.6k

Thanks for sending me the histograms by email.

The coverage is very low (30x 15-mer coverage for homozygous regions), and heterozygosity looks low too. One can barely see a peak that would correspond to heterozygous k-mers. Yet, this peak is what the diploid model expects. So the haploid model for this type of histograms is supposed to work much better, I recommend using it.

On a side note, I expect that the heterozygous regions will not assemble well, and homozygous regions, better. The k predicted by kmergenie looks quite okay given the looks of the histograms.

Thanks for spotting a bug in the documentation, have corrected it.

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by Rayan Chikhi ★ 1.6k

0

Entering edit mode

Thanks for the help and the advice!

Heterozygosity is indeed low (that was one of the criteria to select the sample for sequencing). So if I understand correctly, the diploid model requires a substantial level of heterozygosity to work. It may be good to mention that requirement in the documentation.

I will use the haploid model except for samples with higher levels of heterozygosity.

ADD REPLY • link updated 2.3 years ago by Ram 45k • written 9.9 years ago by robin.vanvelzen.wur • 0