I have a dataset with lots of single-celled algae, diatoms, and bacteria. I can't use my normal pipeline ([MULTIPLE BINNING PROGRAMS] -> DAS Tool -> GTDB-Tk) because the latter 2 are built specifically for prokaryotes.
How can I determine which metagenome assembled genomes [MAG] are eukaryotic? The options I know about are the following:
- EukRep - This happens at the contig level and is ML-based. I've run "positive controls" through EukRep in the past and it couldn't identify that all the contigs were eukaryotic so I don't have complete faith in this methodology. There seemed to be a high false negative rate.;
- BBSuite's sketch.sh - This worked pretty well in the past but AFAIK, it's heavily dependent on a database that isn't actively updated and really affected by the completeness of the genome (other methods could be as well);
- Mash/Sourmash - Same as above. Though, I've had more errors with these.
- BARRNAP - Identify 16S/18S then see what is more prominent. Not sure how I feel about this.
Is there something akin to GTDB-Tk but can handle eukaryotic MAGs?
To reiterate, I want to bin out my MAGs and then perform eukaryotic vs. prokaryotic classification (not on the contig level preferably). What are the recommended methods to do this?
Check out this preprint that came out today on binning eukaryotic genomes:
https://www.biorxiv.org/content/10.1101/2021.07.25.453713v1.full.pdf
Wow...what are the chances that a biorxiv/tool comes out specifically addressing the question that I was asking??? Pretty wild. Thanks for linking me out to this. I've seen some activity on GitHub about this so I'm excited to see they are going under peer review.