Question

Would you recommend me a binning tool ?

0

Entering edit mode

16 months ago

v.berriosfarias ▴ 140

Hello! , on the last months I have been producing some metagenome assembled genomes from different datasets, for this purpose I'm currently using metabat2, vamb and CONCOCT.

The problem is that CONCOCT takes too much time to run and its a bottleneck on my analysis, would you suggest me a binning tool with a demonstrated good binning results that is easy to install and fast to use? I was trying with rosella https://github.com/rhysnewell/rosella but I have problems with the installation.

Thanks!

Valentín.

MAGs metagenomics binning • 1.5k views

ADD COMMENT • link updated 16 months ago by Mensur Dlakic ★ 28k • written 16 months ago by v.berriosfarias ▴ 140

score 1 · Answer 1 · 2023-07-19

1

Entering edit mode

16 months ago

Mensur Dlakic ★ 28k

The problem is that CONCOCT takes too much time to run and its a bottleneck on my analysis, would you suggest me a binning tool with a demonstrated good binning results that is easy to install and fast to use?

With all due respect: do you want to get this done fast, or to get it done right? We are not talking about days that it takes CONCOCT to bin metagenomes, but rather minutes or hours at the most. Pretty sure that metabat2 will bin any dataset in under 2 minutes. Don't have much experience with vamb beyond testing it a couple of times, but I thought it was fairly fast.

I wrote a binning tool that I couldn't recommend to you, because it is not fast. On most ordinary datasets it takes 30-60 minutes, but there are datasets with 9+ billion assembled bases and 300+ bins on which it might take 10-20 hours. Yet I do not care about the time it takes, because the computer does all the work and next day I just collect the data.

Unless you goal is to process hundreds of metagenomes per day over a long period of time - other than JGI and a few sequencing centers I don't know any entity with this need - I suggest you stick with the proven tools you already have. Better yet, to do things right it might be a good idea to run all the binners and find their consensus.

ADD COMMENT • link 16 months ago by Mensur Dlakic ★ 28k

0

Entering edit mode

Yes, I forgot to mention that my dataset is too huge (antarctic soil metagenomes) and on a HPC , CONCOCT main command took 8 days to finish and metabat2 took two days to run. I don't want to sacrifice performance to get things faster, it is just that I want to know if there is a binner that may outperform CONCOCT in both run time and in performance. Thanks for the metaWRAP recommendation, its refinement module has helped me a lot in previous projects.

Thanks so much for your answer.

ADD REPLY • link 16 months ago by v.berriosfarias ▴ 140

1

Entering edit mode

To me this sound like a non-optimal use of resources, possibly because of slow disk or memory sharing with other processes. Have you monitored the memory and CPU usage during the run? I ran metabat2 on a ~0.9 billion dataset without any problem on a computer with 256 GB of memory, and the memory usage was never above 1%.

time metabat2 -i contigs.fa -o bins -t 40 --unbinned

MetaBAT 2 (GIT-NOTFOUND) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, maxEdges 
200 and minClsSize 200000. with random seed=1689814338
211 bins (908873498 bases in total) formed.
2227.618u 24.664s 1:37.64 2306.7%       0+0k 2384+1906192io 71pf+0w

metabat2 is a C-compiled program with almost no memory footprint, and it should be able to handle much larger datasets than 1 billion bases without any problem. I suspect slow disk or inadequate memory.

ADD REPLY • link 16 months ago by Mensur Dlakic ★ 28k