I have run velvet for hash length 31 and 51 without setting any parameters.Then I calculated expcov and covcutoff using given perl script velvet-estimate-exp_cov.pl).
For the hash 31, I found velvetg parameters expcov and covcutoff to be 1 and 0. I expected these values to go further down for hash 51 (since higher K-mer gives lower coverage). But surprisingly, it was 17 and 0.
Could anyone explain this?
velvet-estimate-exp_cov.pl is a script by Torsten Seeman which calulates the mode of the kmer-coverage histogram generated from a Velvet assembly stats.txt file.
kmer-coverage is proportional to the x-coverage statistic a biologist would typically use (read bp per contig bp) except it is measured in kmers (read kmers per contig kmer) in order to provide job security for bioinformatics programmers.
You need to choose the optimal kmer and cov_cutoff settings based on your tolerances for an assembly in a similar way you would choose anything in life - i.e. are you willing to accept some bad with the good in order to get more of it?
We are missing some details. How many contigs are you getting out of these assemblies? What is the size of the assemblies they are generating? What is the size of the organism you are assembling?
31 and 51 are radically different kmer values. If your kmer value is low there will be more chances for reads to overlap but also many path ambiguities in your graph and your assembly will be very fragmented (but very large). If your kmer value is high you will have a very stringent, small assembly, with a higher N50. The coverage cutoff is another dial you can turn to filter out low coverage contigs that might not be real.
expcov reportedly drives scaffolding and homologue splitting (distinguishing between members of a gene family and outright repeats) but I find adjusting it independent of covcutoff usually has little effect on an assembly.
you can compile Velvet to go higher, and you probably should for modern-day read lengths
I thought that Velvet doesn't let you go above 31...?