Hello
I am using Ubuntu 64-bit 14.04 LTS with 32 Gb of RAM and running Velvet v1.2.10 and MetaVelvet v1.2.02 with MAXKMERLENGTH=31. I have the following problem: MetaVelvet gets stuck at the "Pebble resolution" (scaffoliding) step when trying to assemble contigs from a test dataset of 100000 Illumina MiSeq 2x300 PE reads, which were not processed in any way (no trimming, adapter removal etc.). After reaching "Starting pebble resolution..." step meta-velvetg gets stuck for indeterminable amount of time with one core showing 100% load. The last time I tried to test it I waited for more than 12 hours before killing the process manually.
I also tried to use MetaVelvet as one of the assemblers in the MetAMOS pipeline and it froze with the same symptoms at the (probably) same step.
$ meta-velvetg Assem -ins_length 350
[meta-velvetg] Check command line options...
OK. Your command line options seem to be good.
[meta-velvetg] Load meta-graph ...
[0.000001] Reading read set file Assem/Sequences;
[0.048786] 200000 sequences found
[0.222462] Done
[0.593372] Reading graph file Assem/Graph2
[0.593503] Graph has 148377 nodes and 200000 sequences
[meta-velvetg] Category = 'short1' Ave. = 350, SD = 35
[meta-velvetg] Category = 'short2' Ave. = -1, SD = -1
[meta-velvetg] Category = 'long' Ave. = -1, SD = -1
[meta-velveth] ...done (load meta-graph).
[meta-velvetg] Estimate coverage parameters...
[mate-velvetg] Estimate expected coverage ... yes. Expected coverage = 1.87805
[mate-velvetg] Estimate expected coverages ... yes.
[1.431262] Writing into stats file Assem/meta-velvetg.Graph2-stats.txt...
[MetaHisto] First valley = 2
[MetaHisto] Largest peak coverage = 2 (frequency count = 6.35635e+06)
[MetaHisto] Noise cutoff coverage = 635635
[meta-velvetg] Warning: Can't find multiple coverage peaks.
[meta-velvetg] Trun on single coverage peak mode.
[MetaGraph] 1-th coverage peak = 1.87805
[meta-velvetg] Estimate coverage cutoff ... yes. Coverage cutoff = 0.939024
[meta-velvetg] ...done (estimate coverage parameters).
[meta-velvetg] Remove low & high coverage nodes ...
[meta-velvetg] Min. coverage cutoff for short reads = 0.939024
[meta-velvetg] Min. coverage cutoff for long reads = -1
[meta-velvetg] Max. coverage cutoff for short & long reads = -1
[meta-velvetg] Min. contig length = -1
[VelvetGraph] === Remove low coverage nodes ===
[1.642372] Removing contigs with coverage < 0.939024...
[1.645049] Concatenation...
[1.655673] Renumbering nodes
[1.655685] Initial node count 148377
[1.656127] Removed 0 null nodes
[1.656145] Concatenation over!
[1.659085] Concatenation...
[1.669341] Renumbering nodes
[1.669351] Initial node count 148377
[1.669692] Removed 0 null nodes
[1.669698] Concatenation over!
[VelvetGraph] === Remove high coverage nodes ===
[VelvetGraph] === Clip tips hardly ===
[1.669714] Clipping short tips off graph, drastic
[1.675634] Concatenation...
[1.686590] Renumbering nodes
[1.686609] Initial node count 148377
[1.686959] Removed 0 null nodes
[1.686967] Concatenation over!
[1.686984] 148377 nodes left
[meta-velvetg] ...done (remove low & high coverage nodes).
[meta-velvetg] Scaffolding based on paired-end information ...
[MetaGraph] === Scaffolding with single peak mode ===
[VelvetGraph] === Rock Bank ===
[1.687007] Read coherency...
[1.690673] Identifying unique nodes
[1.693808] Done, 82558 unique nodes counted
[1.693821] Trimming read tips
[1.701690] Renumbering nodes
[1.701704] Initial node count 148377
[1.702079] Removed 0 null nodes
[1.702087] Confronted to 0 multiple hits and 0 null over 0
[1.702092] Read coherency over!
[VelvetGraph] === Create read paring array ===
[VelvetGraph] === Detach dubious reads ===
[VelvetGraph] === Activate gap markers ===
[VelvetGraph] === Scaffolding ===
[1.703117] Starting pebble resolution...
Any help would be really appreciated.
Thank you for your suggestions, rtilu, but none of them worked, unfortunately. I tried (in different combinations):
--scaffolding no
parameter;MAXKMERLENGTH=303
;Every time I ran MetaVelvet, the run ended up in the exactly same way - freezing at the "Starting pebble resolution..." step.
As many people previously used MetaVelvet with success, I started to suspect that something is wrong with my computer and/or Linux installation. Thus I installed Velvet and MetaVelvet under a virtual machine running Ubuntu 12.04.5 LTS and MetaVelvet worked perfectly that time. I also noticed that two different versions of g++ were used to compile Velvet and MetaVelvet: g++ 4.6.3 in Ubuntu 12.04 (the version that worked) and g++ 4.8.2 in Ubuntu 14.04. So I suspect that this behaviour is due to the differences in these two versions of g++ compiler.
TL;DR: MetaVelvet doesn't run properly under Ubuntu 14.04 when compiled with g++ 4.8.2, but works in Ubuntu 12.04.5 (g++ 4.6.3).
Omega was recently released as an overlap-graph de novo Assembler for Metagenomics
http://bioinformatics.oxfordjournals.org/content/30/19/2717
or try IDBA-UD, which was pretty good for metagenomic assembly.
http://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud/index.html
Thank you for recommendations, rtliu, these two programs already were my top two alternative choices. I am also pondering about trying SPAdes as well. But I really needed MetaVelvet output for comparison, as it is (probably) the most popular metagenomic assembler nowadays.
Have you used parameter "exp_cov auto"? As the last resort write to MetaVelvet author. By the way, I have tried SPAdes on one lane HiSeq metagenomic data but it ran of memory on my server.
Yes, I used this parameter every time I ran the assembler, it didn't help. I am almost sure that this behaviour is related to the version of g++ compiler, so I will let the developers of MetaVelvet know about this issue soon.
As I am working with MiSeq dataset, I am inclined to try SPAdes, read many good things about it.
Did you try by downgrading g++ to 4.6.3?
Thanks!