Question

metagenomic assembly coverage, for multiple samples

0

Entering edit mode

7.3 years ago

willnotburn ▴ 50

I have multiple samples (interleaved reads), which were co-assembled into one final.contigs.fa assembly. The downstream goal is analysis of gene distribution among the samples, multivariate stats etc. To do that, the first step is to map reads from each sample back onto final.contigs.fa with bowtie2. I did that and got sam files, which I converted to sorted bam files. Now, I am trying to determine coverage. Questions:

Q1: Assembly coverage. My friend asks: what's your coverage? He means that as an assembly quality measure, and an easy number, like 30X. This post explores tools to get such a number from mpileup results.

So, do I just concatenate all my bam files and run samtools mpileup concatenated.bam...or maybe samtools mpileup *.bam? Please help me out.

Q2: Per-sample coverage. Following up on this old post, is there a difference between

samtools mpileup (options) sample1.bam sample2.bam sample3.bam

and

samtools mpileup (options) sample1.bam
samtools mpileup (options) sample2.bam
samtools mpileup (options) sample3.bam

in a way coverage is calculated (linked OP asked about variant calling).

Lastly, any opinions on what is "good coverage"? For example, if each sample has 5X-10X coverage, is that good enough?

metagenomics coverage bowtie2 samtools mpileup • 1.4k views

ADD COMMENT • link 7.3 years ago by willnotburn ▴ 50