I have multiple samples (interleaved reads), which were co-assembled into one final.contigs.fa
assembly. The downstream goal is analysis of gene distribution among the samples, multivariate stats etc. To do that, the first step is to map reads from each sample back onto final.contigs.fa
with bowtie2
. I did that and got sam
files, which I converted to sorted bam
files. Now, I am trying to determine coverage. Questions:
Q1: Assembly coverage. My friend asks: what's your coverage? He means that as an assembly quality measure, and an easy number, like 30X. This post explores tools to get such a number from mpileup
results.
So, do I just concatenate all my bam
files and run samtools mpileup concatenated.bam
...or maybe samtools mpileup *.bam
? Please help me out.
Q2: Per-sample coverage. Following up on this old post, is there a difference between
samtools mpileup (options) sample1.bam sample2.bam sample3.bam
and
samtools mpileup (options) sample1.bam
samtools mpileup (options) sample2.bam
samtools mpileup (options) sample3.bam
in a way coverage is calculated (linked OP asked about variant calling).
Lastly, any opinions on what is "good coverage"? For example, if each sample has 5X-10X coverage, is that good enough?