Question

Metagenomics assembly comparison

1

Entering edit mode

5.7 years ago

ARich ▴ 130

Dear Biostar user,

I have a question regarding metagenomics assemby stats. I ran megahit and metaSpades on one sample (paired end) Then I ran metaquast to test out of these two assemblers which one is providing better statistics.

Currently the output is bit confusing to know which one i should choose for all samples.
Below are tables with some of metaquast results

num_contigs.xlsx

| Assemblies    | megahit   | SPAdes    |
|---------------------------------- |:-------:  |-------:   |
| Bacteroides_acidifaciens  | 179   | 112   |
| Dorea_sp._5_2     | 107   | 78    |
| Lactobacillus_johnsonii   | 24    | 14    |
| Lactobacillus_johnsonii_DPC_6026  | 26    | 21    |
| Lactobacillus_johnsonii_FI9785    | 13    | 6     |
| Lactobacillus_murinus     | 20    | 11    |
| Lactobacillus_reuteri     | 38    | 23    |
| Lactobacillus_reuteri_TD1     | 27    | 12    |

Misassembled_contigs_length

| Assemblies                       | megahit | SPAdes  |
|----------------------------------|---------|---------|
| Bacteroides_acidifaciens         | 2082038 | 1727134 |
| Dorea_sp._5_2                    | 173212  | 129559  |
| Lactobacillus_johnsonii          | 208152  | 153022  |
| Lactobacillus_johnsonii_DPC_6026 | 231804  | 205378  |
| Lactobacillus_johnsonii_FI9785   | 126758  | 96387   |
| Lactobacillus_murinus            | 24519   | 12355   |
| Lactobacillus_reuteri            | 93062   | 50979   |
| Lactobacillus_reuteri_TD1        | 67949   | 39834   |

Largest_contig.txt

| Assemblies                       | megahit | SPAdes |
|----------------------------------|:-------:|-------:|
| Bacteroides_acidifaciens         |  349575 | 117688 |
| Dorea_sp._5_2                    |  150971 | 199679 |
| Lactobacillus_johnsonii          |  46811  |  54841 |
| Lactobacillus_johnsonii_DPC_6026 | 46811   | 54841  |
| Lactobacillus_johnsonii_FI9785   | 46811   | 54841  |
| Lactobacillus_murinus            | 6067    | 5492   |
| Lactobacillus_reuteri            | 26761   | 25341  |
| Lactobacillus_reuteri_TD1        | 26761   | 13863  |
| not_aligned                      | 244235  | 132156 |

I think megahit performed better in term on contig length but i need a feedback from expert.

Looking forward for some feedback! Thank you in advance!

Assembly • 3.2k views

ADD COMMENT • link updated 5.7 years ago by h.mon 35k • written 5.7 years ago by ARich ▴ 130

2

Entering edit mode

Unfortunately, your question is not that simple to answer. For contig length, the "best" assembler depends on the species. However, for misassembled contig length, spades perform better than megahit for all species, so looking at just these two metrics, I would say spades is better. However, you should evaluate other quality metrics, such as number of genes annotated, percentage of reads mapping back to the metagenomes, and so on. Two papers to help you out:

Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software

Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters!

ADD REPLY • link 5.7 years ago by h.mon 35k

0

Entering edit mode

Thank you for these paper. They were really helpful. I have another question regarding workflows. I am always confused with the workflow because each one has its own. Can you suggest me something more standard in term of taxonomic and functional classification.

I have understood two workflows by going through the literature. Workflow1: QC --> Contamination removal --> Assembly --> Remapping to get coverage and gene prediction (prodigal) --> Binning (Maxbin, CONCOCT) --> Taxonomic classification (Kraken,motu, metaphlan2) and Functional classification (humann2)

Workflow2: QC --> Contamination removal --> Taxonomic and functional classification.

My question is for workflow 1 why do we do binning? And can you suggest something for functional profiling? I am not clear about functional classification? what are the input which tools etc

For workflow2: Can we do binning directly in after contamination removal and then perform classification or this is the normal way?

Thank you in advance

ADD REPLY • link 5.6 years ago by ARich ▴ 130