Hi,
How do i select a best kmer for genome size estimation using jellyfish. I did using 31 and 41 both gave me a different result. Read chemistry is 150*2. are there any other tools, i am trying gce also.
Thanks in advance
Hi,
How do i select a best kmer for genome size estimation using jellyfish. I did using 31 and 41 both gave me a different result. Read chemistry is 150*2. are there any other tools, i am trying gce also.
Thanks in advance
You can use Jellyfish as described here. BBMap suite has kmercountexact.sh
that can be used for this purpose.
Thank you for the response. I was following the 1st link for genome estimation. It was mentioned for eukaryote 17-31 would be fine in jellyfish. In the tutorial they chose 25. Still not understanding how to choose the kmer length for counting. kmercountexact.sh
is taking kmer length of 31 as default.
Is 31 a std kmer length for any eukaryote genomes?
I tend to use ntCard and then throw the histograms at genomescope; both of which can be run offline.
You just need to modify the files a little bit. Remove the F* lines and change the separator to space instead of tab.
There isn't any particular kmer that is golden. It will depend on your species and data generated, i.e repetitiveness, long vs short read etc.
There are some good papers on exactly this topic:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I have tried that and best kmer from kmergenie was way too high. For this data it was 119. I was not sure this kmer to use in jellyfish for genome estimation
I don't understand when kmergenie along with best kmer already gives you estimated genome size then why you want to use jellyfish again for the same? Checking the histo.pdf file generated from kmergenie properly can give you required answer.