Hi all,
First of all let's say I have 5 samples (S1,S2...S5), and I have duplicates for each samples (S1a, S1b, S2a, S2b....S5b). The are paired end 2x150bp fastq files sequenced from the Next500 Nextera platform
Below is the plan for the metagenomics analysis: 1.QC: Use bbduk and trimmomatic for quality control 2. FastQC to check quality 3.Assemble with Megahit 4.Alignment and mapping reads with bwa/bbmap 5.Binning 6. Use Prodigal for functional gene annotation of the assembled contigs 7. Quantifying the annotated genes of the metagenome and export into a tsv file
And here are the questions: 1. I know I will have to concatenate the duplicate samples (concat S1a and S1b together), but can I concatenate all 10 files together prior to Megahit assembly, and somehow separate the samples so I know how where the quantified genes are from which sample?
The reason I wanted to concatenate samples together is that I get slightly higher mapping rates with larger samples. What is the usual mapping rate for de novo assembly? I am only getting mapping rates of ~30-45%. Is it normal for de novo assembly? And how can I improve the mapping rate?
I downloaded a script that will remove all contigs less than 1000bp right after assembly. Should I do this before mapping reads or after it? (Generally contigs >1kbp may make it easier when binning draft genomes).
Any recommendations for programs that can annotate metagenome against KEGG, COG and CaZy database? The web-based database cannot handle the large size of my samples.
Thank you in advance! I am new in metagenomics analysis and feel free to correct me if I am wrong! :)
Cheers
Alan
Can you clarify these: What are your organisms? Fungus, bacteria? What do you want to do via metagenomics? species diversity or identification in your samples?
They are microbial mat samples, mainly bacteria and archaea. I want to do mainly 2 things, the first one is to identify and quantify functional genes in the metagenome, and the second thing is to produce draft genomes (binning).
One of the things that keeps bugging me is the mapping rate. I have used different programs for assembly (idba, megahit) with different parameters, and different mapping programs (bowtie2, bwa and bbmap) but the mapping rate is still at ~30-45%. Is it normal with de novo assembly?
cheers
Alan