We have a very simple experiment, 1 control (C) and 3 independent treatments (T, B, A). So technically I wrote the following script thinking it would compare C to T, then C to B and finally C to A.
Though at the bottom of my pipeline, it seems that DESeq2 is comparing treatments T vs A. Is there a better way to write the condition so it would compare independentely control to each treatment ? (After that I should cluster genes and output a heatmap)
# DESeq1 libraries
library( "DESeq2" )
library("Biobase")
# Heatmap libraries
library( "genefilter" )
library(gplots)
r2 = read.table("matrix.txt", header=TRUE, row.names=1)
head(r2)
samples <- data.frame(row.names=c("C1", "T1", "B1", "A1"), condition=as.factor(c("C1", "T1", "B1", "A1")))
dds <- DESeqDataSetFromMatrix(countData = as.matrix(r2), colData=samples, design=~condition)
# Run DESeq2
dds <- DESeq(dds)
# DESeq2 Results
res <- results(dds)
head (res)
**********
> mcols(res, use.names=TRUE)
DataFrame with 6 rows and 2 columns
type description
<character> <character>
baseMean intermediate the base mean over all rows
log2FoldChange results log2 fold change (MAP): condition T1 vs A1
lfcSE results standard error: condition T1 vs A1
stat results Wald statistic: condition T1 vs A1
pvalue results Wald test p-value: condition T1 vs A1
padj results BH adjusted p-values
Oh Thanks, I see where my mistake was. I re-wrote the script, can you confirm this is the proper way to use it?
That should be correct.
When you make a factor, the levels are ordered lexicographically and most programs will just assume that the first level in the list of levels (i.e., from levels(
samples$condition
), not the first value ofsamples$condition
) is the base level for comparisons. To do otherwise would open a very large bag of worms (e.g., if you assume that some factors starting with C are control, then you'd piss off people working on cancer:normal comparisons).Unfortunately, it returns an error message.
I have a first column with gene names (before C1), which should be row.names. Am I using data.frame properly? Should I adapt row.names?
Thanks! it seems to be working well now. Though when I run mcols, it says T1 vs C1. What happened to treatment A and B? Is it still comparing them separately versus the control C1?
The
results()
function has a 'coef' option (or something like that) that you can use to extract the results for the other coefficients.