Question

Comparability between normalized Counts / TPMs and log2FoldChange

0

Entering edit mode

5 weeks ago

NorbertK ▴ 10

Hey everyone.

This might be a trivial question, but I ran into this trying to explain myself why I didn't compare the genes found in GSEA's leading edge analysis additionally with TPMs or the normalized counts to have quasi single-sample comparison.

As I didn't have an appropriate answer to why except 'Why compare non-normalized values in TPMs' and nothing really for norm. Counts, I have started to compare my genes of interest from GSEA terms to see which genes specifically show differences in the pathway.

Now I ran into the same issue as others have before. My normalized counts do not always reflect the log2FoldChange. I.e my Sample vs Control l2FC is -1.5, but my norm. Counts for my Samples are (let's say) "500, 450, 510, 520", while my control counts are "210, 250, 180, 184"

While I know that DESeq2 uses internal variables to calculate the foldchange, I would like to know how I could explain this discrepancy.

I have already checked that I am indeed normalizing the counts in the code, and the factors are set correctly as well (see code attached)

Thank you all so much!

\# Split DESeq2 command in case of batch-effect correction if "run" has more than one state
 if(batchNrCheck > 1) {

        ddsTxi <- DESeqDataSetFromTximport(txi,
                                           colData = samples_subset,
                                           design = ~ group + run)      # //Added batch-effect correction to design
  } else
  {
        ddsTxi <- DESeqDataSetFromTximport(txi,
                                           colData = samples_subset,
                                           design = ~ group)

  }
 ddsTxi$group <- factor(ddsTxi$group, levels = c(groupA,groupB)) #factor level determines which is reference (groupA)

 dds <- DESeq(ddsTxi)
 res <- results(dds)
 vsd <- vst(dds, blind = T)
 normCounts<-as.data.frame(counts(dds, normalized=TRUE))

FoldChange TPMs DESeq2 normalized RNAseq counts • 368 views

ADD COMMENT • link 5 weeks ago by NorbertK ▴ 10

0

Entering edit mode

You at least need to show the normalized counts. Showing raw counts is pointless. You're also correcting for run which might have an impact.

ADD REPLY • link 5 weeks ago by ATpoint 87k

0

Entering edit mode

Yeah, showing raw TPMs seemed quite pointless in my eyes as well.

For the batch correction through 'run' - it actually doesn't impact it in this case, as the data comes from the same run, so the design = ~ group only.

I just find it quite vexing, that for most genes, the trend of positive or negative log2FoldChange correlates with what I see in the normalized counts, but for a group of genes (in mitochondrial function) specifically, it shows the polar opposite.

ADD REPLY • link 5 weeks ago by NorbertK ▴ 10

1

Entering edit mode

What is "raw TPMs"? There is raw counts, as inputted to DESeq2. There is TPM, which is essentially raw counts divided by total counts and gene length in kilobases. That would be floats. You show integers, suggesting raw counts. You should plot either counts(dds, normalized = TRUE) or the output of assay(vst(dds)) to approximate the DESeq2 results.

Or use DESeq2::plotCounts() which does that internally. Can you show some conflicting genes like this?

ADD REPLY • link 5 weeks ago by ATpoint 87k

0

Entering edit mode

My bad carelessly throwing the term "raw" around. The TPMs are as you said calculated and result in floats. This is done actually prior to DESeq2 as we use Salmon, which gives the resulting TPMs as 'abundance'.

The .csv file resulting from counts(dds, normalized = TRUE) is the one I would like to use as single-sample comparison to the log2FoldChange

One example would be SDHB - I have negative log2FoldChange of -0.643, but the norm.counts are (rounded): For Control: 1070, 1290, 550 For Sample: 1230, 1870, 960, 730

For the plotCounts result: For Control: 1516, 1860, 800 For Sample: 1411, 733, 555, 932

So the plotCounts does indeed give me more unifying results. Now I am quite confused

ADD REPLY • link 5 weeks ago by NorbertK ▴ 10

0

Entering edit mode

Now you're mixing transcript abundances (TPMs from salmon) which is what salmon produces with gene level data which is what DESeq2 assumes. Did you run tximport?

ADD REPLY • link 5 weeks ago by ATpoint 87k

0

Entering edit mode

Yes. tximport ran regularly.

So I import it and take the import$abundance

ADD REPLY • link 5 weeks ago by NorbertK ▴ 10