Hey everyone.
This might be a trivial question, but I ran into this trying to explain myself why I didn't compare the genes found in GSEA's leading edge analysis additionally with TPMs or the normalized counts to have quasi single-sample comparison.
As I didn't have an appropriate answer to why except 'Why compare non-normalized values in TPMs' and nothing really for norm. Counts, I have started to compare my genes of interest from GSEA terms to see which genes specifically show differences in the pathway.
Now I ran into the same issue as others have before. My normalized counts do not always reflect the log2FoldChange. I.e my Sample vs Control l2FC is -1.5, but my norm. Counts for my Samples are (let's say) "500, 450, 510, 520", while my control counts are "210, 250, 180, 184"
While I know that DESeq2 uses internal variables to calculate the foldchange, I would like to know how I could explain this discrepancy.
I have already checked that I am indeed normalizing the counts in the code, and the factors are set correctly as well (see code attached)
Thank you all so much!
\# Split DESeq2 command in case of batch-effect correction if "run" has more than one state
if(batchNrCheck > 1) {
ddsTxi <- DESeqDataSetFromTximport(txi,
colData = samples_subset,
design = ~ group + run) # //Added batch-effect correction to design
} else
{
ddsTxi <- DESeqDataSetFromTximport(txi,
colData = samples_subset,
design = ~ group)
}
ddsTxi$group <- factor(ddsTxi$group, levels = c(groupA,groupB)) #factor level determines which is reference (groupA)
dds <- DESeq(ddsTxi)
res <- results(dds)
vsd <- vst(dds, blind = T)
normCounts<-as.data.frame(counts(dds, normalized=TRUE))
You at least need to show the normalized counts. Showing raw counts is pointless. You're also correcting for
run
which might have an impact.Yeah, showing raw TPMs seemed quite pointless in my eyes as well.
For the batch correction through 'run' - it actually doesn't impact it in this case, as the data comes from the same run, so the design = ~ group only.
I just find it quite vexing, that for most genes, the trend of positive or negative log2FoldChange correlates with what I see in the normalized counts, but for a group of genes (in mitochondrial function) specifically, it shows the polar opposite.
What is "raw TPMs"? There is raw counts, as inputted to DESeq2. There is TPM, which is essentially raw counts divided by total counts and gene length in kilobases. That would be floats. You show integers, suggesting raw counts. You should plot either
counts(dds, normalized = TRUE)
or the output ofassay(vst(dds))
to approximate the DESeq2 results.Or use
DESeq2::plotCounts()
which does that internally. Can you show some conflicting genes like this?My bad carelessly throwing the term "raw" around. The TPMs are as you said calculated and result in floats. This is done actually prior to DESeq2 as we use Salmon, which gives the resulting TPMs as 'abundance'.
The .csv file resulting from
counts(dds, normalized = TRUE)
is the one I would like to use as single-sample comparison to the log2FoldChangeOne example would be SDHB - I have negative log2FoldChange of -0.643, but the norm.counts are (rounded): For Control: 1070, 1290, 550 For Sample: 1230, 1870, 960, 730
For the plotCounts result: For Control: 1516, 1860, 800 For Sample: 1411, 733, 555, 932
So the plotCounts does indeed give me more unifying results. Now I am quite confused
Now you're mixing transcript abundances (TPMs from salmon) which is what salmon produces with gene level data which is what DESeq2 assumes. Did you run tximport?
Yes. tximport ran regularly.
So I import it and take the import$abundance