I have several contigs (~1 kb - 20 kb) from a metagenomic sample which, according to marker gene analysis, belong primarily to one eukaryote or several prokaryotes both bacterial and archaeal. All of the contigs have a very similar GC content.
What is the best way to separate the eukaryote DNA from the prokaryote without using references (eg, blasting each contig to a reference won't work, most of the contigs are too far removed from any reference)? Codon biases? Looking for certain genomic features?
I've been running MaxBin (paper) on our latest metagenomic assemblies and could not be happier with the results. Unlike in e.g. the ESOM approach, with MaxBin you do not need to provide dozens of parameters nor select bins by hand. In essence, MaxBin estimates the number of bins through marker gene analysis and then scaffolds are binned on the basis of coverage and tetranucleotide frequency. Highly recommended. I've been thinking of combining IDBA-UD (initial assembly), MaxBin (binning) and PRICE (targeted assembly of bins) for a really kick ass pipeline..