I have been reading around the analysis of metagenome recently and came across methods of annotation that use raw reads or just assembled metagenomic sequences. Does it bias the annotation to not assemble and bin the metagenome into MAGs or is it safe to assume you will not get any "hybrid" gene where half come from one theoretical MAG and half from another when annotating raw reads?
Apologies if this is a silly question, i am not to familiar with how annotators work.
Thank you this is informative. The main reason I ask as this paper concluded that in shall metagenomes annotation of the short reads can be better, however it does call some false positives. From what i understand of what you have said though, there is no real need to bin an assembled meta genome prior to annotation and it will not result in bias?
I'm not an expert and probably not the best to answer that question, but they are often related, binning and assembling, but you don't necessarily need to bin your reads or metagenomic contigs, I think. Though binning your reads based on a GC% and/or kmer frequency can be useful to obtain MAGs and get better taxonomic and functional annotations.
Would binning not only give you a better crossover between the but not a better individual analysis in either? My main worry with not binning is that gene might be called that spans a contig end and another contig start where each individual contig came from a different theoretical MAG, but I am not sure if that is even possible?
This is my understanding, but I only did it once awhile ago, after you assemble your contigs you can binning them into clusters of identical kmer frequency and/or GC% (among other features). These clusters of bins represent contigs/sequences from closely related spp./strains and depending on genome completeness and contamination you might get MAGs. In any case since you've a cluster of sequences from a identical spp./strain, in principle it should be easier to annotate those sequences. I think the assemble although can be improved, it will not change with or without binning. This is my understanding, but I do not have experience.
Regarding your second question, I think that might happen perhaps with closely related species or strains, if you have multiple enterobacteriaceae in a sample it might be difficult to figure out if a contig belongs to one spp. or another, I think. Although this problem is not mitigated when you just try to blast your contigs or ORFs/CDSs per contig. It will still be difficult to figure out the right annotation.