Majority of the currently available metagenomics binning tools are designed to work with short reads and contigs obtained from short reads.
Does someone know if there are any tools available to bin long reads or contigs obtained from long reads?
Thank you very much! :)
I think Kraken (and possibly centrifuge) can take long reads. Kraken I’m fairly sure can work on contigs too.
Thank you very much. I will try it and see. :)
What is the difference between binning long contiguous sequences assembled from short reads and binning long contiguous sequences obtained from long reads?
I believe there is no difference apart from the effects of the error rates of short reads and long reads.
However, I tried to bin a simulated dataset of reads from 2 bacterial genomes (with 20kb - 21kb read lengths and 10% error rate) and the tool failed to identify two bins. It produced only one bin with a few sequences and most of the remaining sequences were not binned. The tool used is MaxBin 2.2.4
And how different where the two genomes? No tool will successfully separate e.g. Escherichia coli O157:H7 Sakai and Escherichia coli O157:H7 EC4115..
I used Escherichia coli CFT073 and Staphylococcus aureus JP080. When we get short reads and bin the contigs, MaxBin produces 2 bins with good results.
Similarly, I tried MaxBin with long reads from the same 2 genomes but it gave only 1 bin.
Does maxbin use also depth of coverage? That could be the reason as you don't get that dimension with long reads..
In case MaxBin works at the protein level for the detection of those marker genes, I think your 10% simulated error rate will lead to a single bin..
Yes. I think this is the issue. I will find another software to do binning. Thank you very much for your insights and explanations. :)