Hi all, I am puzzled about "-kmer" options during de novo assembly.
First, I did k-mer frequency analysis.
Reported:
For P(x): Possible peaks including: 100 the unique peak is 100
For F(x): Possible peaks including: 10 103 the unique peak is 103
Raw kmer depth estiamtion:
Curve peak expect_depth
k-mer species 100 100.687
k-mer individuals 103 102.643
Thus I thought the kmer depth of my data is about 101. I thought I should use this value in the following analysis.
Then I began to correct sequencing errors and trim reads containing singleton kmers using bfc. I got advice from a boss. He said I just need to set -kmer value as 61. (my data is 100bp x 2) I once read another paper which set -kmer 61 also. So is it right to just set kmer value as 61? Is there nothing to do with my own data? Why? Thank you.
Yingzi
You can also use kmergenie to find the optimal range for the assembly.
Generally speaking though people tend to keep 2/3rds of the read length as the kmer however it is always better to have multiple assemblies, and evaluate the same.