Hello Folks,
lets start from the beginning so that you get an Idea of what I'am doing. My fist files are Paired-End-Reads which I put into Module that clips and merges (it's an ancient DNA sample). After that I filtered it with deconseq and got 1 fastq File of "clean" reads which is about 3gb big and contains 14.639.466 reads.
Now I want do do a metagenomic assembly from the clean fastq File. The length of the reads is ranging form 25 to 192. So I compiled velvet with the following parameters:
make 'CATEGORIES=1' 'MAXKMERLENGTH=17' 'OPENMP=1'
I used this Kmerlength because I wanted to include all the reads in my file and the Openmp for using the multithreading option. I am not entirely sure what the Categories Option does but what I get from the manual it seem ok to set this to one.
Then I executed Velveth with this command line:
./velveth out-dir 17 -fastq /path/to/the/file
And it is creating the initial files. After that I borrowed a lot of memory (100gb) and tired to execute Velvet g with the the following parameters:
./velvetg *out-dir* -exp_cov auto
Then the program starts and until now I've seen two possible outcomes. One is this strange Error I'am not understanding:
Scanning pre-graph file out-dir/PreGraph for k-mers
velvetg: Key length 24 greater than max allowed value (17) Recompile Velvet to deal with this word length.: No such file or directory
Why is there suddenly a greater key length? I thought the only thing that velveth is doing is constructing a Hashtable. So recompiled it with 'MAXKMERLENGTH=27' and ran it again.
Then I encountered the second output: PBS Killed Job Memory Limit exceeded.
Any thoughts on this? Anybody with experience in metagenomic assembly? How much Memory you think I need for the task or am I doing something fundamentally wrong? If I could I would ask somebody at our group but I am the only one here working with Velvet so I have no one to ask.
As always with kind regards and thanks in advance,
Julian
IIRC velvet is a crazy memory hog. I remember allocating 244 gigs to run velvetg. Quick question: Have you tried using digital normalization before assembly? That makes the task computationally less expensive. Check out http://ged.msu.edu/papers/2012-diginorm/
No I have not but I'll check it out. Thank you for the hint.
I had the same problem about "core dump" or similar and it was that you need to locate the sequences within the same folder as the software to make it run, despite you had written the very path towards the file wherever you may left it.
Here is my example: