What Happens When The K-Mer Size Is Larger Than The Trimmed Reads Size In Velvet Assembly?
2
0
Entering edit mode
11.7 years ago
Rahul Sharma ▴ 660

Hi all,

I am assembling a genome of size 120Mb from 5 different libraries of different inserts. Insert sizes are 300bp, 1Kb, 8Kbs, 20kbs and singletons. first two libraries are from Illumina genome analyzer(Read length: 76bp) and the last two are from HiSeq (Read length: 100bp). After reads trimming mean lengths are 55 and 87bp from GA and HiSeq runs. I want to do assemblies with velvet, would the k-mer size of 35, 45, 55, 65, 75 will crate any issue? Since my trimmed read length is quite varying? Will it be fine to assemble both GA and HiSeq reads together or should I assemble separately and merge assemblies later? I would appreciate the decent comments.

Regards

velvet • 6.6k views
ADD COMMENT
1
Entering edit mode
11.7 years ago

I don't know first hand but I recall people stating that it can't work as the method won't be able to build the kmers that are long enough.

Stated for example in a blog post from Homologous: http://www.homolog.us/blogs/2012/10/10/multi-kmer-de-bruijn-graphs/

More relevant overall information on k and other parameters can be found in Titus Brown's blog:

In fact all pages tagged as assembly are worth consulting:

ADD COMMENT
1
Entering edit mode
11.1 years ago
SES 8.6k

I was curious about this because I use velvet a lot, so I tested it. There is no explicit warning from velveth, but you can tell there were no overlaps found by a couple of ways. First, look at the Roadmaps file. If you choose a k-mer size larger than your read lengths, then the Roadmaps found will be equal to the input sequence number. Another way would be to just run velvetg and take a look at the graph produced. If it runs rather quickly and ends with something like:

...
[155.488308] EMPTY GRAPH
Final graph has 0 nodes and n50 of 0, max 0, total 0, using 0/20198538 reads

Then you have a clear indication no overlaps were found for that hash length. Because read lengths vary, I think that all the sequences would have to be processed in order to warn about these conditions. Though, it would probably be helpful to warn about this after the pre-processing stage or fall back to a hash length shorter than the reads before working on the Roadmaps.

ADD COMMENT

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6