Entering edit mode
5.7 years ago
ARich
▴
130
Dear Biostar user,
I have a question regarding metagenomics assemby stats. I ran megahit and metaSpades on one sample (paired end) Then I ran metaquast to test out of these two assemblers which one is providing better statistics.
Currently the output is bit confusing to know which one i should choose for all samples.
Below are tables with some of metaquast results
num_contigs.xlsx
| Assemblies | megahit | SPAdes |
|---------------------------------- |:-------: |-------: |
| Bacteroides_acidifaciens | 179 | 112 |
| Dorea_sp._5_2 | 107 | 78 |
| Lactobacillus_johnsonii | 24 | 14 |
| Lactobacillus_johnsonii_DPC_6026 | 26 | 21 |
| Lactobacillus_johnsonii_FI9785 | 13 | 6 |
| Lactobacillus_murinus | 20 | 11 |
| Lactobacillus_reuteri | 38 | 23 |
| Lactobacillus_reuteri_TD1 | 27 | 12 |
Misassembled_contigs_length
| Assemblies | megahit | SPAdes |
|----------------------------------|---------|---------|
| Bacteroides_acidifaciens | 2082038 | 1727134 |
| Dorea_sp._5_2 | 173212 | 129559 |
| Lactobacillus_johnsonii | 208152 | 153022 |
| Lactobacillus_johnsonii_DPC_6026 | 231804 | 205378 |
| Lactobacillus_johnsonii_FI9785 | 126758 | 96387 |
| Lactobacillus_murinus | 24519 | 12355 |
| Lactobacillus_reuteri | 93062 | 50979 |
| Lactobacillus_reuteri_TD1 | 67949 | 39834 |
Largest_contig.txt
| Assemblies | megahit | SPAdes |
|----------------------------------|:-------:|-------:|
| Bacteroides_acidifaciens | 349575 | 117688 |
| Dorea_sp._5_2 | 150971 | 199679 |
| Lactobacillus_johnsonii | 46811 | 54841 |
| Lactobacillus_johnsonii_DPC_6026 | 46811 | 54841 |
| Lactobacillus_johnsonii_FI9785 | 46811 | 54841 |
| Lactobacillus_murinus | 6067 | 5492 |
| Lactobacillus_reuteri | 26761 | 25341 |
| Lactobacillus_reuteri_TD1 | 26761 | 13863 |
| not_aligned | 244235 | 132156 |
I think megahit performed better in term on contig length but i need a feedback from expert.
Looking forward for some feedback! Thank you in advance!
Unfortunately, your question is not that simple to answer. For contig length, the "best" assembler depends on the species. However, for misassembled contig length, spades perform better than megahit for all species, so looking at just these two metrics, I would say spades is better. However, you should evaluate other quality metrics, such as number of genes annotated, percentage of reads mapping back to the metagenomes, and so on. Two papers to help you out:
Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software
Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters!
Thank you for these paper. They were really helpful. I have another question regarding workflows. I am always confused with the workflow because each one has its own. Can you suggest me something more standard in term of taxonomic and functional classification.
I have understood two workflows by going through the literature. Workflow1: QC --> Contamination removal --> Assembly --> Remapping to get coverage and gene prediction (prodigal) --> Binning (Maxbin, CONCOCT) --> Taxonomic classification (Kraken,motu, metaphlan2) and Functional classification (humann2)
Workflow2: QC --> Contamination removal --> Taxonomic and functional classification.
My question is for workflow 1 why do we do binning? And can you suggest something for functional profiling? I am not clear about functional classification? what are the input which tools etc
For workflow2: Can we do binning directly in after contamination removal and then perform classification or this is the normal way?
Thank you in advance