Velvet Assembly Problem
1
3
Entering edit mode
14.0 years ago
Ajw ▴ 30

I have run velvet for hash length 31 and 51 without setting any parameters.Then I calculated expcov and covcutoff using given perl script velvet-estimate-exp_cov.pl).

For the hash 31, I found velvetg parameters expcov and covcutoff to be 1 and 0. I expected these values to go further down for hash 51 (since higher K-mer gives lower coverage). But surprisingly, it was 17 and 0. Could anyone explain this?

velvet assembly coverage • 9.4k views
ADD COMMENT
1
Entering edit mode

you can compile Velvet to go higher, and you probably should for modern-day read lengths

ADD REPLY
0
Entering edit mode

I thought that Velvet doesn't let you go above 31...?

ADD REPLY
8
Entering edit mode
14.0 years ago

velvet-estimate-exp_cov.pl is a script by Torsten Seeman which calulates the mode of the kmer-coverage histogram generated from a Velvet assembly stats.txt file.

kmer-coverage is proportional to the x-coverage statistic a biologist would typically use (read bp per contig bp) except it is measured in kmers (read kmers per contig kmer) in order to provide job security for bioinformatics programmers.

You need to choose the optimal kmer and cov_cutoff settings based on your tolerances for an assembly in a similar way you would choose anything in life - i.e. are you willing to accept some bad with the good in order to get more of it?

We are missing some details. How many contigs are you getting out of these assemblies? What is the size of the assemblies they are generating? What is the size of the organism you are assembling?

31 and 51 are radically different kmer values. If your kmer value is low there will be more chances for reads to overlap but also many path ambiguities in your graph and your assembly will be very fragmented (but very large). If your kmer value is high you will have a very stringent, small assembly, with a higher N50. The coverage cutoff is another dial you can turn to filter out low coverage contigs that might not be real.

expcov reportedly drives scaffolding and homologue splitting (distinguishing between members of a gene family and outright repeats) but I find adjusting it independent of covcutoff usually has little effect on an assembly.

ADD COMMENT

Login before adding your answer.

Traffic: 2225 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6