Effect of Bootstrapping/Gibbs Sampling in Salmon Counts
2
1
Entering edit mode
2.6 years ago
saipra003 ▴ 20

Hi Everyone, I am a bit confused about the difference between Gibbs Sampling and Bootstrapping when it comes to Salmon and how these procedures affect downstream analysis. For context, I am trying to do analysis of 49 matched cancer vs. normal RNAseq samples in the context of alternative splicing (i.e. I am trying to cluster together patients with similar alternative splicing profiles and then see what genes are driving the clustering). I read the bootstrapping and Gibbs sampling improve transcript quantification for downstream analysis, but I am unsure how dramatic this effect may be for my purpose. Any advice or help in this regard would be appreciated!

Gibbs Bootstrapping RNAseq Salmon Sampling • 2.0k views
ADD COMMENT
3
Entering edit mode
2.6 years ago
Rob 6.9k

To be clear, enabling bootstrapping or Gibbs sampling does not change the “primary estimate” (i.e. the TPM or NumReads in the quant.sf) files at all. Rather bootstrapping and Gibbs sampling are both ways to estimate _posterior uncertainty_. That is, when salmon estimates a particular abundance for a transcript in a sample (say — transcript A produced 500 fragments), sometimes there can be a high degree of certainty in this estimate and other times a lot of uncertainty. For example, if all 500 fragments assigned to this transcript map uniquely back to it, uncertainly will be very low. On the other hand, if this transcript has a near identical splice variant or an allelic variant and all or almost all of these reads are multi-mapping, the uncertainty may be quite high.

The primary estimates used in most common downstream analyses are “point” estimates. That is, in this case, they are maximum likelihood estimates with no notion of their uncertainty. Bootstrapping or Gibbs sampling are two different ways to estimate the uncertainty for each abundance point estimate. They generate information that can be used in downstream analysis tools to assess not just what the best estimate of abundance is for a transcript in a sample, but how certain we are in that abundance. However, not all downstream tools take advantage of this information. For example, if you are performing a differential analysis, a tool like swish will take advantage of this information, but e.g. DESeq2 or EdgeR will not. So, you can always manually look at the variance of the bootstrap replicates or Gibbs samples to manually assess the confidence in a transcript’s expression, but if you want to make use of this information systematically in downstream analysis, you need to find a tool for your chosen task that takes advantage of this information.

ADD COMMENT
1
Entering edit mode

Thank you this makes much more sense!

ADD REPLY
0
Entering edit mode

Swish has tools for computing and visualizing uncertainty metrics also, see vignette here:

https://mikelove.github.io/fishpond/articles/swish.html

ADD REPLY
2
Entering edit mode
13 months ago
Gordon Smyth ★ 7.7k

edgeR uses the Salmon bootstrap samples to assess differential transcript expression. edgeR has had this functionality since October 2018 but we have only this year written up a formal manuscript with performance comparisons, see

ADD COMMENT

Login before adding your answer.

Traffic: 1503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6