Entering edit mode
6.0 years ago
vijinim
▴
100
I'm new to the field of metagenomics and I have been reading research work done on binning in metagenomics. I came across two types of metagenomics binning tools;
- binning of reads (before assembly) - ex. MetaProb, BiMeta
- binning of contigs (after assembly) - ex. MaxBin, MetaWatt
My understanding is that we bin reads and contigs to identify which group of closely related organisms or OTU they belong to as there can be a huge number of organisms in a metagenomic sample.
Can someone please explain me why do we do binning before (reads) and after (contigs) assembly, what is intended to be achieved and the consequences of each?
Sorry if this is a naive question. I would really appreciate an explanation. :) Thank you!
IMO binning reads before assembly is about removing redundancy (it makes the assembly process less costly), whereas binning contigs post assembly is about discovering individual genomes from a metagenome assembly
Edit. There's also another use case for binning reads, but in it a subsequent assembly isn't expected, i.e. if you attempt to estimate species composition from reads alone..
How can we remove redundancy by binning reads?
I thought that we bin reads so we can separate the reads belonging to different species and use the sequences in each bin to assemble genomes of those species separately.
Well, yes, you're correct. Binning is incorrect terminology. In this scenario I would have called it simply clustering. If you're binning reads based on k-mer content and assembly is expected, then I suppose the goal would be partitioned assembly of complete genomes from metagenomic data
Thank you very much for the explanations and clarifications. :)
SO, what is the better between binning reads or binning after contig?