Question

best kmer for genome size estimation

0

Entering edit mode

7.0 years ago

popayekid55 ▴ 110

Hi,

How do i select a best kmer for genome size estimation using jellyfish. I did using 31 and 41 both gave me a different result. Read chemistry is 150*2. are there any other tools, i am trying gce also.

Thanks in advance

genome kmer • 6.3k views

ADD COMMENT • link updated 4.9 years ago by harish ▴ 470 • written 7.0 years ago by popayekid55 ▴ 110

score 1 · Answer 1 · 2017-12-07

1

Entering edit mode

7.0 years ago

Tm ★ 1.1k

Hello,

You can give a try to kmergenie.

Mostly it works for me.

ADD COMMENT • link 7.0 years ago by Tm ★ 1.1k

1

Entering edit mode

I have tried that and best kmer from kmergenie was way too high. For this data it was 119. I was not sure this kmer to use in jellyfish for genome estimation

ADD REPLY • link 7.0 years ago by popayekid55 ▴ 110

0

Entering edit mode

I don't understand when kmergenie along with best kmer already gives you estimated genome size then why you want to use jellyfish again for the same? Checking the histo.pdf file generated from kmergenie properly can give you required answer.

ADD REPLY • link 7.0 years ago by Tm ★ 1.1k

score 1 · Answer 2 · 2017-12-07

1

Entering edit mode

7.0 years ago

GenoMax 147k

You can use Jellyfish as described here. BBMap suite has kmercountexact.sh that can be used for this purpose.

ADD COMMENT • link 7.0 years ago by GenoMax 147k

0

Entering edit mode

Thank you for the response. I was following the 1st link for genome estimation. It was mentioned for eukaryote 17-31 would be fine in jellyfish. In the tutorial they chose 25. Still not understanding how to choose the kmer length for counting. kmercountexact.sh is taking kmer length of 31 as default.

Is 31 a std kmer length for any eukaryote genomes?

ADD REPLY • link 7.0 years ago by popayekid55 ▴ 110

score 1 · Answer 3 · 2019-12-13

I tend to use ntCard and then throw the histograms at genomescope; both of which can be run offline.

You just need to modify the files a little bit. Remove the F* lines and change the separator to space instead of tab.

There isn't any particular kmer that is golden. It will depend on your species and data generated, i.e repetitiveness, long vs short read etc.

score 0 · Answer 4 · 2019-12-12

0

Entering edit mode

4.9 years ago

andorjkiss ▴ 50

There are some good papers on exactly this topic:

ADD COMMENT • link 4.9 years ago by andorjkiss ▴ 50