Question

kmergenie does not predicted a best k value

0

Entering edit mode

7.0 years ago

StudentBio • 0

hi please, I am trying out kmergenie to determine optimal kmer values for phoenix dactylifera genome assembly. and i get this error any suggestion please

./kmergenie /home/vshare/outils/trimmed.fastq --diploid

running histogram estimation

Setting maximum kmer length to: 133 bp

computing histograms (from k=21 to k=121): 21 31 41 51 61 71 81 91 101 111 121 

ntCard wall-clock time over all k values: 2172 seconds 

fitting model to histograms to estimate best k

could not fit histograms-k101.histo

could not fit histograms-k111.histo

could not fit histograms-k121.histo

could not fit histograms-k21.histo

could not fit histograms-k31.histo

could not fit histograms-k41.histo

could not fit histograms-k51.histo

could not fit histograms-k61.histo

could not fit histograms-k71.histo

could not fit histograms-k81.histo

could not fit histograms-k91.histo

could not predict a best k value

No best k found

kmergenie assembly • 2.8k views

ADD COMMENT • link updated 7.0 years ago by h.mon 35k • written 7.0 years ago by StudentBio • 0

0

Entering edit mode

What is the expected genome size and ploidy, and target sequencing coverage? Did you check for contaminants (bacterial, human, whatever) and did you remove sequencing adapters?

ADD REPLY • link 7.0 years ago by h.mon 35k

0

Entering edit mode

i'm sorry but i dont know how i can expect genome size and ploidy, and target sequencing coverage and this for what i'm trying to find the best K for use Genomescope qui (detecting the genome characteristics) according to fastqc report: Sequence length 20-397 and %GC 42

about my reads i trimmed them using sickle

(I use the diploid option because according to a study they find that the phoenix dactylifera genome contains 18 pairs chromosomes )

ADD REPLY • link 7.0 years ago by StudentBio • 0

0

Entering edit mode

Acording to another study, the genome size should be around 670Mb. You can calculate target sequencing coverage using this estimative of genome size. These considerations are important to design the best sequencing strategy and choose an appropriate assembler.

Why do you want to assemble, if there is a reference genome availbale? If all the data you have at hand are these short (length 20-397) reads, most likely your assembly will be a worst than the published genome. What analyses you intend to perform downstream? I have the feeling mapping to this reference genome will be a better approach.

ADD REPLY • link 7.0 years ago by h.mon 35k