Question

Interpretation of DESeq2 DE-analysis

1

Entering edit mode

3.3 years ago

Vladimir Leshuk ▴ 50

I'm trying to learn how to conduct RNA-seq differential expression analysis. I used data from this site, generally this dataset about mammary gland of the mice, samples were collected from two types of cells (basal and luminal) from mice with different "sexual experience" (virgin, lactate, and pregnant) and counts were obtained for each gene in a sample.

I uploaded data in R, made DESeq1 object, designed a formula

info <- read.csv('./SampleInfo.txt', sep='\t')
data <- read.csv('./GSE60450_LactationGenewiseCounts.txt', sep='\t')
info$Status <- relevel(info$Status, ref='virgin')
info$CellType <- relevel(info$CellType, ref='basal')
SampleTable <- data.frame(sex_exp=info$Status, cell=info$CellType)
data_dseq=DESeqDataSetFromMatrix(countData = countdata, colData = SampleTable, 
                                   design = ~ cell + sex_exp + cell:sex_exp)

And ran a function that estimate DE

data_DE <- DESeq(data_dseq)

The main question arose when I tried to get specific results about DE. If we call this function

resultsNames(data_DE)

we get

[1] "Intercept"                   "cell_luminal_vs_basal"      
[3] "sex_exp_lactate_vs_virgin"   "sex_exp_pregnant_vs_virgin" 
[5] "cellluminal.sex_explactate"  "cellluminal.sex_exppregnant"

and with these 'names' we can call logFC and corresponding p-adj for genes in our datasets (by function results()). As far as I understand these LogFC and p-adj are from comparisons of specific groups which we define by the formula.

I ask you to estimate if I understand the information from these names in the right way or not; also I still have some empty gaps and I'll appreciate it if you help me to fill them :)

"Intercept". Because reference levels of factors are 'virgin' and 'basal' this part will contain information about the expression of different genes in basal cells of virgin mice. But I'm not sure with which sample DESeq2 compares this one and what logFC means there.
"cell_luminal_vs_basal". There we left 'virgin' unchanged and change 'basal' to 'luminal'. It means that logFC describes differences in expression between basal and luminal patterns on gene expression in virgin mice.
"sex_exp_lactate_vs_virgin" "sex_exp_pregnant_vs_virgin". These two are similar to the previous one but 'basal' doesn't change and virgin changes to lactate and pregnant respectively. It means that in the first case logFC is about the difference in gene expression in basal cells of lactating vs virgin mice (pregnant vs virgin in the second).
"cellluminal.sex_explactate" "cellluminal.sex_exppregnant". I'm not sure but it seems to me that logFC describes changes in gene expression in luminal cells of lactate and virgin mice in the first case and between pregnant and virgin mice in the second.

Could you please check these definitions? Am I right?

I evolved these statements from regression analysis. In R lm() function output give Intercept (if we have categorical and numeric predictors) which is a value of the variable which we want to predict if all numeric predictors are zero and with the first level of all categorial predictors. I tried to extrapolate it on the DESeq2 output and I'm not sure about of reliability of this.

Thanks for your help and time :)

differential DESeq2 expression R • 1.7k views

ADD COMMENT • link updated 2.7 years ago by 1769mkc ★ 1.2k • written 3.3 years ago by Vladimir Leshuk ▴ 50

score 0 · Answer 1 · 2021-08-02

Hi,

You are completely right in comparing the lm() synthax and vocabulary with DESeq2. Internally, DESEq2 uses an extension of the linear regression (a generalized linear model of the negative binomial family) so the the basic concepts and terminology remain the same. Now to answer your specific questions:

As you wrote, the intercept refers the gene expression in the samples corresponding to the reference levels of the factors (virgin & basal in your case). It is not compared to other samples, but compared to 0 (no expression). Log2FC is simply the log2 transformation of the baseMean expression. The pvalue reflects how statistically significant is the expression of the gene compared to 0.
Exact !
Exact !
No. These are the interaction terms of your model (cell:sex_exp in the design). This measures how the cell type effect (luminal vs basal) differs between virgin mice and lactating mice. Try ?results to see more examples of how the interactions can be used to specify contrasts. For instance if you want to see the cell type effect in lactating mice, this is, by definition, the main cell type effect (cell_luminal_vs_basal) + the interaction between this effect and the sex_exp (cellluminal.sex_explactate).