Question

Do you NEED to bin a metagenome into MAGs?

0

Entering edit mode

4.0 years ago

robert.murphy ▴ 90

I have been reading around the analysis of metagenome recently and came across methods of annotation that use raw reads or just assembled metagenomic sequences. Does it bias the annotation to not assemble and bin the metagenome into MAGs or is it safe to assume you will not get any "hybrid" gene where half come from one theoretical MAG and half from another when annotating raw reads?

Apologies if this is a silly question, i am not to familiar with how annotators work.

Assembly binning • 2.1k views

ADD COMMENT • link updated 4.0 years ago by antonioggsousa 3.2k • written 4.0 years ago by robert.murphy ▴ 90

score 0 · Answer 1 · 2020-12-09

0

Entering edit mode

4.0 years ago

antonioggsousa 3.2k

Hi,

I guess the difference relies on the confidence of the annotation. Annotating raw reads (I guess you mean short-reads, raw somehow suggests the reads were not trimmed in quality and for adapters - and they should be) is more difficult as you can easily understand due to its length (~150 bp) than annotating assembled contigs or scaffolds from MAG (>1Kb). Particularly it would be impossible to try to annotate the taxonomic origin of those reads, with the exception of taxonomic markers such as ribosomal RNA genes/reads, although you can annotate them functionally with some success using interproscan against InterPro database of protein domains and protein families, because these rely on short conserved regions. This approach was actually applied by previous EBI metagenomics pipeline (available online at: https://www.ebi.ac.uk/metagenomics/pipelines/4.1). Although someone could argue that this was chosen due to computational limitations of an online service provider. Actually, the most recent version performs assembly (at: https://www.ebi.ac.uk/metagenomics/pipelines/5).

Therefore, if you want to annotate your metagenomes at taxonomic and functional levels as best as possible, i.e., get better annotations of genes and taxa, I would say that you need to try to assemble your metagenomes.

I hope this helps,

António

ADD COMMENT • link 4.0 years ago by antonioggsousa 3.2k

0

Entering edit mode

Thank you this is informative. The main reason I ask as this paper concluded that in shall metagenomes annotation of the short reads can be better, however it does call some false positives. From what i understand of what you have said though, there is no real need to bin an assembled meta genome prior to annotation and it will not result in bias?

ADD REPLY • link 4.0 years ago by robert.murphy ▴ 90

0

Entering edit mode

I'm not an expert and probably not the best to answer that question, but they are often related, binning and assembling, but you don't necessarily need to bin your reads or metagenomic contigs, I think. Though binning your reads based on a GC% and/or kmer frequency can be useful to obtain MAGs and get better taxonomic and functional annotations.

ADD REPLY • link 4.0 years ago by antonioggsousa 3.2k

0

Entering edit mode

Would binning not only give you a better crossover between the but not a better individual analysis in either? My main worry with not binning is that gene might be called that spans a contig end and another contig start where each individual contig came from a different theoretical MAG, but I am not sure if that is even possible?

ADD REPLY • link 4.0 years ago by robert.murphy ▴ 90

0

Entering edit mode

This is my understanding, but I only did it once awhile ago, after you assemble your contigs you can binning them into clusters of identical kmer frequency and/or GC% (among other features). These clusters of bins represent contigs/sequences from closely related spp./strains and depending on genome completeness and contamination you might get MAGs. In any case since you've a cluster of sequences from a identical spp./strain, in principle it should be easier to annotate those sequences. I think the assemble although can be improved, it will not change with or without binning. This is my understanding, but I do not have experience.

Regarding your second question, I think that might happen perhaps with closely related species or strains, if you have multiple enterobacteriaceae in a sample it might be difficult to figure out if a contig belongs to one spp. or another, I think. Although this problem is not mitigated when you just try to blast your contigs or ORFs/CDSs per contig. It will still be difficult to figure out the right annotation.

ADD REPLY • link 4.0 years ago by antonioggsousa 3.2k