From hearsay I know that de Bruijn graphs of large genomes (e.g. human) are usually constructed with k = 51, or that k = 51 is at least a good initial choice.
I however am unable to find any source for this, does anyone know where it is coming from?
Thanks for the detailed answer! The application would be genome assembly of short reads. Well actually, what we are doing is storing a k-mer set in small space, so the question would be very general about any kind of k-mer based method. Then it is probably hard to answer though.
Well, unfortunately, the nitty-gritty details required for of algorithm design escape me. But I would recommend taking a look at:
In general, though, high quality genome assemblies nowadays use a combination of short-reads and long reads or Hi-C data. No whatsoever optimization regarding the k-mer size is going to provide you with similar gains in quality of the assembly like the incorporation of this additional information.