Question

How does samtools mpileup handle multiple .bam inputs?

0

Entering edit mode

7.9 years ago

traviata ▴ 20

With samtools mpileup you can use multiple .bam files as inputs. When samtools computes depth, are these files simply concatenated, or is there a special way samtools synthesizes the data from multiple .bam files?

The samtools github faq seemed to have something about this, but I wasn't exactly sure how to interpret what they were saying:

Between single- and multi-sample variant calling, which is preferred?

By using multi-sample calling, we gain power on SNPs shared between samples, but lose power on singleton SNPs. Here is a way of thinking of this. Suppose we have 1% false positive rate (FPR) for variant calling from one sample. If we call SNPs from 100 samples separately and then combine the calls, the FPR would be around 10-20% (not 100% because more SNPs are found given 100 samples). To retain an acceptable FPR on singletons, we have to be more stringent on each sample and thus lose power. Combining single-sample calls naively would not increase power on shared SNPs. This is where multi-sample calling does better: by taking the advantage of correlation between samples, we are able to call a SNP if it appears in multiple samples, but too weak to call in each sample individually. Joint calling is particularly preferable if we have multiple low-coverage samples for which single-sample calling does not work well. It is also able to reveal some artifacts only detectable with many samples.

RNA-Seq samtools • 5.6k views

ADD COMMENT • link 7.9 years ago by traviata ▴ 20

0

Entering edit mode

A similar thread here for your interest: Samtools: merge and mpileup vs mpileup alone for variant-calling with multiple BAM

ADD REPLY • link 7.3 years ago by Kevin Blighe 89k