Hello! , on the last months I have been producing some metagenome assembled genomes from different datasets, for this purpose I'm currently using metabat2, vamb and CONCOCT.
The problem is that CONCOCT takes too much time to run and its a bottleneck on my analysis, would you suggest me a binning tool with a demonstrated good binning results that is easy to install and fast to use? I was trying with rosella https://github.com/rhysnewell/rosella but I have problems with the installation.
The problem is that CONCOCT takes too much time to run and its a bottleneck on my analysis, would you suggest me a binning tool with a demonstrated good binning results that is easy to install and fast to use?
With all due respect: do you want to get this done fast, or to get it done right? We are not talking about days that it takes CONCOCT to bin metagenomes, but rather minutes or hours at the most. Pretty sure that metabat2 will bin any dataset in under 2 minutes. Don't have much experience with vamb beyond testing it a couple of times, but I thought it was fairly fast.
I wrote a binning tool that I couldn't recommend to you, because it is not fast. On most ordinary datasets it takes 30-60 minutes, but there are datasets with 9+ billion assembled bases and 300+ bins on which it might take 10-20 hours. Yet I do not care about the time it takes, because the computer does all the work and next day I just collect the data.
Unless you goal is to process hundreds of metagenomes per day over a long period of time - other than JGI and a few sequencing centers I don't know any entity with this need - I suggest you stick with the proven tools you already have. Better yet, to do things right it might be a good idea to run all the binners and find their consensus.
Yes, I forgot to mention that my dataset is too huge (antarctic soil metagenomes) and on a HPC , CONCOCT main command took 8 days to finish and metabat2 took two days to run. I don't want to sacrifice performance to get things faster, it is just that I want to know if there is a binner that may outperform CONCOCT in both run time and in performance. Thanks for the metaWRAP recommendation, its refinement module has helped me a lot in previous projects.
To me this sound like a non-optimal use of resources, possibly because of slow disk or memory sharing with other processes. Have you monitored the memory and CPU usage during the run? I ran metabat2 on a ~0.9 billion dataset without any problem on a computer with 256 GB of memory, and the memory usage was never above 1%.
time metabat2 -i contigs.fa -o bins -t 40 --unbinned
MetaBAT 2 (GIT-NOTFOUND) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, maxEdges
200 and minClsSize 200000. with random seed=1689814338
211 bins (908873498 bases in total) formed.
2227.618u 24.664s 1:37.64 2306.7% 0+0k 2384+1906192io 71pf+0w
metabat2 is a C-compiled program with almost no memory footprint, and it should be able to handle much larger datasets than 1 billion bases without any problem. I suspect slow disk or inadequate memory.
Yes, I forgot to mention that my dataset is too huge (antarctic soil metagenomes) and on a HPC , CONCOCT main command took 8 days to finish and metabat2 took two days to run. I don't want to sacrifice performance to get things faster, it is just that I want to know if there is a binner that may outperform CONCOCT in both run time and in performance. Thanks for the metaWRAP recommendation, its refinement module has helped me a lot in previous projects.
Thanks so much for your answer.
To me this sound like a non-optimal use of resources, possibly because of slow disk or memory sharing with other processes. Have you monitored the memory and CPU usage during the run? I ran
metabat2
on a ~0.9 billion dataset without any problem on a computer with 256 GB of memory, and the memory usage was never above 1%.metabat2
is a C-compiled program with almost no memory footprint, and it should be able to handle much larger datasets than 1 billion bases without any problem. I suspect slow disk or inadequate memory.