Different Problems with Velvet
3
0
Entering edit mode
9.8 years ago
stu111392 ▴ 30

Hello Folks,

lets start from the beginning so that you get an Idea of what I'am doing. My fist files are Paired-End-Reads which I put into Module that clips and merges (it's an ancient DNA sample). After that I filtered it with deconseq and got 1 fastq File of "clean" reads which is about 3gb big and contains 14.639.466 reads.

Now I want do do a metagenomic assembly from the clean fastq File. The length of the reads is ranging form 25 to 192. So I compiled velvet with the following parameters:

make 'CATEGORIES=1' 'MAXKMERLENGTH=17' 'OPENMP=1'

I used this Kmerlength because I wanted to include all the reads in my file and the Openmp for using the multithreading option. I am not entirely sure what the Categories Option does but what I get from the manual it seem ok to set this to one.

Then I executed Velveth with this command line:

./velveth out-dir 17 -fastq /path/to/the/file

And it is creating the initial files. After that I borrowed a lot of memory (100gb) and tired to execute Velvet g with the the following parameters:

./velvetg *out-dir* -exp_cov auto

Then the program starts and until now I've seen two possible outcomes. One is this strange Error I'am not understanding:

Scanning pre-graph file out-dir/PreGraph for k-mers
velvetg: Key length 24 greater than max allowed value (17) Recompile Velvet to deal with this word length.: No such file or directory

Why is there suddenly a greater key length? I thought the only thing that velveth is doing is constructing a Hashtable. So recompiled it with 'MAXKMERLENGTH=27' and ran it again.

Then I encountered the second output: PBS Killed Job Memory Limit exceeded.

Any thoughts on this? Anybody with experience in metagenomic assembly? How much Memory you think I need for the task or am I doing something fundamentally wrong? If I could I would ask somebody at our group but I am the only one here working with Velvet so I have no one to ask.

As always with kind regards and thanks in advance,

Julian

software-error Assembly velvet • 5.7k views
ADD COMMENT
1
Entering edit mode

IIRC velvet is a crazy memory hog. I remember allocating 244 gigs to run velvetg. Quick question: Have you tried using digital normalization before assembly? That makes the task computationally less expensive. Check out http://ged.msu.edu/papers/2012-diginorm/

ADD REPLY
0
Entering edit mode

No I have not but I'll check it out. Thank you for the hint.

ADD REPLY
0
Entering edit mode

I had the same problem about "core dump" or similar and it was that you need to locate the sequences within the same folder as the software to make it run, despite you had written the very path towards the file wherever you may left it.

Here is my example:

Computer@Computer:~/Descargas/velvet_1.2.10$ ./velveth velvet5 31 -fastq.gz -shortPaired ~/Descargas/SRR292770_1.fastq.gz ~/Descargas/SRR292770_2.fastq.gz
ADD REPLY
1
Entering edit mode
9.8 years ago
Vivek Todur ▴ 60

Hi,

Velvet is programmed for assembling single genome. If you really wanted to stick to Velvet, then try MetaVelvet which is developed for the purpose. Or else there is even better pipeline MOCAT, For assembly, gene prediction and classification,

Link: http://vm-lux.embl.de/~kultima/MOCAT/

Good Luck

ADD COMMENT
0
Entering edit mode

Thanks for the advice but in the Meta Velvet manual it clearly states that you have to use velveth and velvetg von "normal" Velvet before you can use Meta Velvet. As far as I understanded the process you take the initial De Bruijn Graph built by velvetg and MetaVelvet didives this graph into subgraphs for each organism in your sample. So I think the workflow is right. But I'll have a look an MOCAT.

ADD REPLY
0
Entering edit mode
9.8 years ago
stu111392 ▴ 30

I looked at the principle of digital normalization, as suggested by RamRS, for my data but when I read the paper another question appeared. So what digital normalization basically does is trimming you k-mer coverage to a certain value. With this process also errors are removed and redundant data is discarded. But when I read the paper about MetaVelvet it says that the division of the graph from Velvet is done by looking at the k-mer frequencies. But if you normalize it before you have no great differences in these frequencies.

As far as I get it using the normalization before do novo assembly of a metagenomic sample is not a good idea. Any thoughts on this?

ADD COMMENT
0
Entering edit mode

You should ditch Velvet for something better like idba-ud.

ADD REPLY
0
Entering edit mode
9.8 years ago

You can make this work on Velvet. I struggle too (and ask many questions on Biostar), but I managed to assemble my metagenome with various KMER lengths of 21, 31, and 41.

Might I suggest also inputting -short in the velveth command? So it would be:

velveth out-dir 17 -short -fastq /path/to/the/file

And then run the velvetg code as previous?

Stay strong.

ADD COMMENT
0
Entering edit mode

Hey, thank you for you suggestion. I'll try it out but at the moment it seem I'll try using spades for further assemblies. By the way I asked in the Meta Velvet Google Group how to assign my reads and they should be declared as short paired. During the next days I will get access to a cluster with much much more power. And if I find the time I will try assembling with Velvet as well.

Best wishes

ADD REPLY

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6