Question

Comparison of ratio of expression between two genes across samples

1

Entering edit mode

4.1 years ago

guillepalou4 ▴ 20

Hi guys,

I have been reading a lot about within-sample and between-sample RNA-seq normalization but I still haven't found an answer for my question.

My data consists of an RNA-seq matrix with Transcripts-Per-Million (TPM) values, so it is within-sample normalized. I want to compare the expression of two genes within a sample (expression gene 1 / expression gene 2), and then compare the resulting ratio between samples (from the same group, I do not have different conditions or experiments, this is not my purpose). My guess is that, as I am comparing relative abundances between genes within a sample, I can compare these ratios between samples without a between-sample normalization (like DESeq2 or TMM). But I am not 100% sure. Do I need this normalization or I can compare samples directly with TPM in this particular case?

Thank you!

Guille

RNA-Seq TPM normalization relative abundance • 1.6k views

ADD COMMENT • link updated 3.0 years ago by Martin Modrák ▴ 80 • written 4.1 years ago by guillepalou4 ▴ 20

score 0 · Answer 1 · 2021-07-09

Hi, I faced a similar issue. I think you can use DESeq2 as a starting point, but then need to do some manual adjustments. I posted to DESeq2 support at https://support.bioconductor.org/p/9138190/ but so far got no response whether the authors agree it is a sensible approach, so some caution is warranted.

EDIT: Mike Love suggested using controlGenes when running estimateSizeFactors in front of DESeq() instead of the method below. Both have some advantages, see the above-linked bioconductor thread for details.

Assuming a design of counts ~ condition the null hypothesis is that for each gene A and a fixed reference B: log(mu_A_condition1) - log(mu_B_condition1) >= log(mu_A_condition2) - log(mu_B_condition2).

My understanding is that this null hypothesis can actually be tested with DESeq2 with a little tweak, because with dummy coding it translates to beta_A_condition2 - beta_B_condition2 <= 0, so I can get the maximum likelihood estimate by just subtracting the two beta values given by DESeq2, get the SE by taking sqrt(beta_A_SE^2 + beta_B_SE_^2) (as the betas should have 0 covariance, right?) which is all I need to compute the Wald statistic and then I can once again carry on the same way DESeq2 does (p-adjustment, shrinkage, ...)

Here's my R code that performs this - you give the value of results(dds) for the coefficient you are trying to compare and the name of the reference gene. It then computes estimates for all genes against the reference, performs p-value adjustments and shrinkage with ashr

change_from_reference <- function(results, reference, altHypothesis = "greater" , threshold = 0) {

  reference_beta <- results[reference, "log2FoldChange"]
  reference_beta_SE <- results[reference, "lfcSE"]
  res_others <- results[rownames(results) != reference,]
  LFC <- res_others$log2FoldChange - reference_beta
  SE <- sqrt(res_others$lfcSE ^ 2 + reference_beta_SE ^ 2)
  T <- threshold

  #Taken from on https://github.com/mikelove/DESeq2/blob/a5941b305b597556d47c68df0b922a0a20b41fb6/R/results.R#L481
  pfunc <- function(q) pnorm(q, lower.tail=FALSE) 

  if (altHypothesis == "greaterAbs") {
      newStat <- sign(LFC) * pmax((abs(LFC) - T)/SE, 0)
      newPvalue <- pmin(1, 2 * pfunc((abs(LFC) - T)/SE))
    } else if (altHypothesis == "lessAbs") {
      newStatAbove <- pmax((T - LFC)/SE, 0)
      pvalueAbove <- pfunc((T - LFC)/SE)
      newStatBelow <- pmax((LFC + T)/SE, 0)
      pvalueBelow <- pfunc((LFC + T)/SE)
      newStat <- pmin(newStatAbove, newStatBelow)
      newPvalue <- pmax(pvalueAbove, pvalueBelow)
    } else if (altHypothesis == "greater") {
      newStat <- pmax((LFC - T)/SE, 0)
      newPvalue <- pfunc((LFC - T)/SE)
    } else if (altHypothesis == "less") {
      newStat <- pmin((LFC + T)/SE, 0)
      newPvalue <- pfunc((-T - LFC)/SE)
    } else {
      stop("Invalid altHypothesis")
    }

  ashrfit <- ashr::ash(LFC, SE,
                     mixcompdist="normal", method="shrink")
  data.frame(gene = rownames(res_others), 
             estimate = LFC, shrunkEst = ashrfit$result$PosteriorMean, 
             shrunkSE = ashrfit$result$PosteriorSD, p.adj = p.adjust(newPvalue, method = "BH")) 
}