Question

Metagenomic analysis workflow

1

Entering edit mode

6.9 years ago

h.l.wong ▴ 70

Hi all,

First of all let's say I have 5 samples (S1,S2...S5), and I have duplicates for each samples (S1a, S1b, S2a, S2b....S5b). The are paired end 2x150bp fastq files sequenced from the Next500 Nextera platform

Below is the plan for the metagenomics analysis: 1.QC: Use bbduk and trimmomatic for quality control 2. FastQC to check quality 3.Assemble with Megahit 4.Alignment and mapping reads with bwa/bbmap 5.Binning 6. Use Prodigal for functional gene annotation of the assembled contigs 7. Quantifying the annotated genes of the metagenome and export into a tsv file

And here are the questions: 1. I know I will have to concatenate the duplicate samples (concat S1a and S1b together), but can I concatenate all 10 files together prior to Megahit assembly, and somehow separate the samples so I know how where the quantified genes are from which sample?

The reason I wanted to concatenate samples together is that I get slightly higher mapping rates with larger samples. What is the usual mapping rate for de novo assembly? I am only getting mapping rates of ~30-45%. Is it normal for de novo assembly? And how can I improve the mapping rate?
I downloaded a script that will remove all contigs less than 1000bp right after assembly. Should I do this before mapping reads or after it? (Generally contigs >1kbp may make it easier when binning draft genomes).
Any recommendations for programs that can annotate metagenome against KEGG, COG and CaZy database? The web-based database cannot handle the large size of my samples.

Thank you in advance! I am new in metagenomics analysis and feel free to correct me if I am wrong! :)

Cheers

Alan

Assembly alignment • 3.6k views

ADD COMMENT • link updated 6.9 years ago by colindaven 7.0k • written 6.9 years ago by h.l.wong ▴ 70

0

Entering edit mode

Can you clarify these: What are your organisms? Fungus, bacteria? What do you want to do via metagenomics? species diversity or identification in your samples?

ADD REPLY • link 6.9 years ago by Mehmet ▴ 820

0

Entering edit mode

They are microbial mat samples, mainly bacteria and archaea. I want to do mainly 2 things, the first one is to identify and quantify functional genes in the metagenome, and the second thing is to produce draft genomes (binning).

One of the things that keeps bugging me is the mapping rate. I have used different programs for assembly (idba, megahit) with different parameters, and different mapping programs (bowtie2, bwa and bbmap) but the mapping rate is still at ~30-45%. Is it normal with de novo assembly?

cheers

Alan

ADD REPLY • link 6.9 years ago by h.l.wong ▴ 70

score 0 · Answer 1 · 2017-12-22

Combining the samples sounds like a really, really bad idea. You might get better assembly stats but on an assembly of messed up chimeras. This is not going to help you.

Also try a read based binning method (as opposed to contigs). You are likely to get more accurate semiquantitative abundance values with this approach and it is therefore complementary to what you've done