DESeq2 analysis for multiple conditions
3
4
Entering edit mode
9.5 years ago

Hi,

I want to do DE analysis using DESeq2. My experiment is small RNAseq experiment with 5 tissue samples. I want to find out the DE genes between different tissues.

I have following matrix (as an example)

gene     tissue1   tissue2   tissue3   tissue4   tissue5
gene1    233       91        17        593       93
gene2    1011      0         7         1         11
gene3    963       2         3         66        2
gene4    908       41        1         74        33
gene5    596       50        26        328       104
gene6    1         0         0         0         1111
gene7    202       187       35        425       277
gene8    985       24        10        76        33
gene9    523       87        32        286       203
gene10   822       82        23        120       87

My aim is to find DE gene between each column, - i.e. tissue1 vs tissue2, tissue1 vs tissue3 ......tissue2 vs tissue3...... tissue4 vs tisue5

I don't fully understand the program, I tried following the program vignette as follows:

library("DESeq2")
CountTable = read.table("test.tsv", header=TRUE, row.names=1)
head(CountTable)

gene    tissue1    tissue2    tissue3    tissue4
gene1    233    91    17    593
gene2    1011    0    7    1
gene3    963    2    3    66
gene4    908    41    1    74
gene5    596    50    26    328
gene6    1    0    0    0

colData = data.frame(
  row.names= colnames(CountTable),
  condition = c("tissue1", "tissue2", "tissue3", "tissue4", "tissue5"),
  libType = c( "single-end", "single-end", "single-end", "single-end", "single-end"))

dds <- DESeqDataSetFromMatrix( countData = CountTable, colData = colData,
  design = ~ condition)
dds <- DESeq(dds)
#estimating size factors
#estimating dispersions
#gene-wise dispersion estimates
#mean-dispersion relationship
#final dispersion estimates
#fitting model and testing
#Warning message:
#In checkForExperimentalReplicates(object, modelMatrix) :
 #same number of samples and coefficients to fit,
 #estimating dispersion by treating samples as replicates.
 #read the ?DESeq section on 'Experiments without replicates'

res <- results(dds)
res

#log2 fold change (MAP): condition testis vs brain
#Wald test p-value: condition tissue5 vs tissue1
#DataFrame with 927 rows and 6 columns
                   baseMean log2FoldChange     lfcSE        stat    pvalue      padj

So in the result I am getting comparison only between tissue5 and tissue1, what do I need to do to find out comparison between each tissue?

Help is greatly appreciated.

P.S.: I am new to R and first time using DESeq2

RNA-Seq DESeq2 • 29k views
ADD COMMENT
1
Entering edit mode

If you only have 5 samples, and they are all different, you can't do any kind of sophisticated statistical analysis on them. There is natural variance of expression, but without biological replicates, you have zero idea what it is. DESeq2 might give you numbers, but they don't mean much.

ADD REPLY
2
Entering edit mode
9.5 years ago
eromasko ▴ 120

In looking up 'results' in the DESeq2 manual at http://www.bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf , I find the following information which suggests that it is only doing the first and last condition comparison:

The results table when printed will provide the information about the comparison, e.g. "log2 fold change (MAP): condition treated vs untreated", meaning that the estimates are of log2(treated /untreated), as would be returned by contrast=c("condition","treated","untreated"). Multiple results can be returned for analyses beyond a simple two group comparison, so results takes arguments contrast and name to help the user pick out the comparisons of interest for printing a results table. The use of the contrast argument is recommended for exact specification of the levels which should be compared and their order. If results is run without specifying contrast or name , it will return the comparison of the last level of the last variable in the design formula over the first level of this variable. For example, for a simple two-group comparison, this would return the log2 fold changes of the second group over the first group (the reference level). Please see examples below and in the vignette.

On a side note, did you look up the information in ?DESeq on 'Experiments without replicates' as the warning message says? I copied a little bit from the manual:

Experiments without replicates do not allow for estimation of the dispersion of counts around the expected value for each group, which is critical for differential expression analysis. If an experimental design is supplied which does not contain the necessary degrees of freedom for differential analysis, DESeq will provide a message to the user and follow the strategy outlined in Anders and Huber (2010) under the section 'Working without replicates', wherein all the samples are considered as replicates of a single group for the estimation of dispersion. As noted in the reference above: "Some overestimation of the variance may be expected, which will make that approach conservative." Furthermore, "while one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation."

ADD COMMENT
1
Entering edit mode
3.3 years ago
195472005 ▴ 20

Deseq2 can return all results between all groups that you input.It seems that your trouble took place in extract the results with this command:

results(dds)

Without any parameter, this command is just like another command below:

results(dds,contrast=c("condition","tissue1",  "tissue5"))

So if you want results between other groups,you should input a new command with a modified parameter contrast like this

results(dds,contrast=c("condition","tissue1",  "tissue4")) # command to extract results between tissue1 and tissue4
results(dds,contrast=c("condition","tissue1",  "tissue3")) # command to extract results between tissue1 and tissue3

There are some lines in the manual in relation to this trouble:

Multiple results can be returned for analyses beyond a simple two group comparison, so results takes arguments contrast and name to help the user pick out the comparisons of interest for printing a results table

ADD COMMENT
1
Entering edit mode

这个问题还是挺值得回答一下的,我的英文不好,所以再写一个中文版的回答给同胞们看一下:

首先提问中的这种数据输入在新版的 DESeq2 中已经会报错了,因为没有技术重复,结果没有统计学意义。报错信息大概长这样:

Error in checkForExperimentalReplicates(object, modelMatrix) : 
  The design matrix has the same number of samples and coefficients to fit,
  so estimation of dispersion is not possible. Treating samples
  as replicates was deprecated in v1.20 and no longer supported since v1.22.

按照提问中的思路,题主是在 样本信息表 colData 中设置了多个组织分组,希望 DESeq2 能够一次性做好两两差异分析,但是结果中只有 tissue1 和 tissue5 之间的比对,而没有其它的比对结果。 按照说明书中的内容,DESeq2 是会将所有分组进行两两比较并返回结果的,题主之所以没有得到自己想要的答案,是因为在用 result() 函数提取结果的时候出了岔子

Multiple results can be returned for analyses beyond a simple two group comparison, so results takes arguments contrast and name to help the user pick out the comparisons of interest for printing a results table

不加参数的 results(dds)results(dds,contrast=c("condition","tissue1", "tissue5")) 是等价的,所以只能获得这两个分组的比较结果,要获取其它的比较结果,就要通过修改 contrast 参数来实现

results(dds,contrast=c("condition","tissue1", "tissue4")) # 提取 tissue1 and tissue4 的差异分析结果
results(dds,contrast=c("condition","tissue1", "tissue3")) # 提取 tissue1 and tissue3 的差异分析结果
ADD REPLY
0
Entering edit mode
9.5 years ago

You can perform this analysis in Genestack platform. The DGE tool - Expression Navigator - is based on DESeq2 (or edgeR) R package. And it possible to find out the DE genes between multiple groups of samples (in your case, 5 groups according to tissue condition).

ADD COMMENT

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6