Question

Linear regression of the RNAseq results to get the genes whose expression change belong to the same trends

0

Entering edit mode

4.5 years ago

Kai_Qi ▴ 130

I have samples across 4 different developmental stages, each stage has 2 replicates. Using DEseq2, I can get the trending line of the counts of each gene, below is the code I have run:

library(DESeq2)
sampleTable <- as.data.frame(sampleTable)
countdata <- read.csv("CSHLRNASeq_counts_Rsubread.csv", stringsAsFactors = FALSE, row.names = 1)
colnames(countdata) <- gsub(".bam" , "", colnames(countdata))
coldata <- sampleTable
coldata$timepoint <- factor(coldata$timepoint, levels = c("E11", "E14", "E18", "Adult"))
ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
                                 colData = coldata,
                                 design = ~ timepoint)
dds <- ddsMat
dds
nrow(dds)
dds <- dds[rowSums(counts(dds)) > 1,] # remove the sample that reads from all group are 0
ddsTC <- DESeq(dds, test = "LRT", reduced = ~1)
resTC <- results(ddsTC)
mcols(ddsTC)
nrow(dds)
library("AnnotationDbi")
library("org.Mm.eg.db")
resTC$symbol <- mapIds(org.Mm.eg.db, keys = row.names(resTC), column = "SYMBOL", keytype = "ENSEMBL", multiVals = "first")
resTC$entrez <- mapIds(org.Mm.eg.db, keys = row.names(resTC), column = "ENTREZID", keytype = "ENSEMBL", multiVals = "first")
head(resTC[order(resTC$padj),], 4)
data_Flna <- plotCounts(ddsTC, which(resTC$symbol == "Flna"), intgroup = c("timepoint"), returnData = TRUE)
ggplot(data_Flna, aes(x=timepoint,y=count)) +
  geom_point() + stat_smooth(se=FALSE, method = "loess") +scale_y_log10()
resultsNames(ddsTC)

Following the manual I typed:

res30 <- results(ddsTC, name = "timepoint", test = "Wald")

and got an error:

Error: subscript contains invalid names

One of the purpose of using timepoint is that I would like to see the comparisons between E11vsE14, E14vsE18, E18vsAdult, so that I can get the genes whose expression level is higher at later stages than earlier stages. I see in some papers linear regression is one solution. I am wondering how can I achieve the goal from what I have done so far? Thank you very much

RNA-Seq R rna-seq sequencing • 1.4k views

ADD COMMENT • link updated 4.5 years ago by thyleal ▴ 160 • written 4.5 years ago by Kai_Qi ▴ 130

score 1 · Answer 1 · 2020-12-28

Hi,

About your error: you probably mistyped the name argument in your second line of code here. Use resultsNames(ddsTC) to see available names.

About the analysis, you seem to want to test this in two distinc ways. One testing for any differences over multiple stages, exactly what you did with the LRT with full and reduced models. See more here.

If you want to test your groups as levels of a categorical level (timepoint), you can set your timepoint variable to a factor with levels ordered as: E11, E14, E18, adult (assuming this is the order from earliest to latest stages). Then, I suggest using fold-change shrinkage and extract coefficients according to your contrasts. By default, this will test all levels against the first level (baseline). See more of this here.

You may need to relevel your factor variable if you're using apeglm shrinkage. However, ashr and normal algorithms can deal with contrasts, so you can specify them as needed. Please see the aforementioned link.

Finally, If you want to see the linear effect of your stages (assuming linearity) you can change your timepoint to a numeric variable, like 0, 1, 2, 3 and use it in DESeq2 as usual. Thus, the reported log2 fold change is per unit of change of your stage variable. If your stages do not match the one unit transformation, this will make interpretation more difficult.

Hope it helps.