I have samples across 4 different developmental stages, each stage has 2 replicates. Using DEseq2, I can get the trending line of the counts of each gene, below is the code I have run:
library(DESeq2)
sampleTable <- as.data.frame(sampleTable)
countdata <- read.csv("CSHLRNASeq_counts_Rsubread.csv", stringsAsFactors = FALSE, row.names = 1)
colnames(countdata) <- gsub(".bam" , "", colnames(countdata))
coldata <- sampleTable
coldata$timepoint <- factor(coldata$timepoint, levels = c("E11", "E14", "E18", "Adult"))
ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ timepoint)
dds <- ddsMat
dds
nrow(dds)
dds <- dds[rowSums(counts(dds)) > 1,] # remove the sample that reads from all group are 0
ddsTC <- DESeq(dds, test = "LRT", reduced = ~1)
resTC <- results(ddsTC)
mcols(ddsTC)
nrow(dds)
library("AnnotationDbi")
library("org.Mm.eg.db")
resTC$symbol <- mapIds(org.Mm.eg.db, keys = row.names(resTC), column = "SYMBOL", keytype = "ENSEMBL", multiVals = "first")
resTC$entrez <- mapIds(org.Mm.eg.db, keys = row.names(resTC), column = "ENTREZID", keytype = "ENSEMBL", multiVals = "first")
head(resTC[order(resTC$padj),], 4)
data_Flna <- plotCounts(ddsTC, which(resTC$symbol == "Flna"), intgroup = c("timepoint"), returnData = TRUE)
ggplot(data_Flna, aes(x=timepoint,y=count)) +
geom_point() + stat_smooth(se=FALSE, method = "loess") +scale_y_log10()
resultsNames(ddsTC)
Following the manual I typed:
res30 <- results(ddsTC, name = "timepoint", test = "Wald")
and got an error:
Error: subscript contains invalid names
One of the purpose of using timepoint is that I would like to see the comparisons between E11vsE14, E14vsE18, E18vsAdult, so that I can get the genes whose expression level is higher at later stages than earlier stages. I see in some papers linear regression is one solution. I am wondering how can I achieve the goal from what I have done so far? Thank you very much