Entering edit mode
6.4 years ago
StudentBio
•
0
hi please, I am trying out kmergenie to determine optimal kmer values for phoenix dactylifera genome assembly. and i get this error any suggestion please
./kmergenie /home/vshare/outils/trimmed.fastq --diploid
running histogram estimation
Setting maximum kmer length to: 133 bp
computing histograms (from k=21 to k=121): 21 31 41 51 61 71 81 91 101 111 121
ntCard wall-clock time over all k values: 2172 seconds
fitting model to histograms to estimate best k
could not fit histograms-k101.histo
could not fit histograms-k111.histo
could not fit histograms-k121.histo
could not fit histograms-k21.histo
could not fit histograms-k31.histo
could not fit histograms-k41.histo
could not fit histograms-k51.histo
could not fit histograms-k61.histo
could not fit histograms-k71.histo
could not fit histograms-k81.histo
could not fit histograms-k91.histo
could not predict a best k value
No best k found
What is the expected genome size and ploidy, and target sequencing coverage? Did you check for contaminants (bacterial, human, whatever) and did you remove sequencing adapters?
i'm sorry but i dont know how i can expect genome size and ploidy, and target sequencing coverage and this for what i'm trying to find the best K for use Genomescope qui (detecting the genome characteristics) according to fastqc report: Sequence length 20-397 and %GC 42
about my reads i trimmed them using sickle
(I use the diploid option because according to a study they find that the phoenix dactylifera genome contains 18 pairs chromosomes )
Acording to another study, the genome size should be around 670Mb. You can calculate target sequencing coverage using this estimative of genome size. These considerations are important to design the best sequencing strategy and choose an appropriate assembler.
Why do you want to assemble, if there is a reference genome availbale? If all the data you have at hand are these short (length 20-397) reads, most likely your assembly will be a worst than the published genome. What analyses you intend to perform downstream? I have the feeling mapping to this reference genome will be a better approach.