Time-course RNA-seq analysis in Deseq2
2
1
Entering edit mode
6.1 years ago
Lisa ▴ 330

Hi,

I am looking to do a time-course analysis of my RNA-seq human, whole blood datasets. I have 30 samples, across 4 time-points. I have no replicates.

T0 - Week 0, initial treatment T1 - Week 2, two weeks into treatment T2 - Week 12, twelve weeks into treatment T3 - Week 26, end of treatment; “Cure”

I have done the pairwise comparisons, comparing each time-point to T3, our cure state, using Deseq2. Now I want to track the changes of gene expression over the full treatment time.
I have read the time-course Vinaigrette on Deseq2 but I am not sure I am understanding it right.

I have assigned two variables, timepoint and SampleID to account for the different times and samples.

(timepoint <- factor(c(rep("0",30),rep("2",30)rep("12",30)rep("26",30))))

(coldata <- data.frame(row.names=colnames(countdata), timepoint, SampleID))

ddsMat <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~SampleID + timepoint)

dds <- DESeq(ddsMat)
res <- results(dds)

However I don’t know if this is enough. I can’t see where the time element comes into play. Any advice on doing a Time- course analysis?

RNA-Seq R • 8.0k views
ADD COMMENT
4
Entering edit mode
6.1 years ago

Hey Lisa,

You have already controlled for time via the inclusion of timepoint in the design formula. You have also done your pairwise comparisons, presumably like this:

res1 <- results(dds, contrast=c("timepoint", "T1", "T0"), independentFiltering=TRUE, alpha=0.05, pAdjustMethod="BH", parallel=FALSE)
res1 <- lfcShrink(dds, contrast=c("timepoint", "T1", "T0"), res=res1)

res2 <- results(dds, contrast=c("timepoint", "T2", "T0"), independentFiltering=TRUE, alpha=0.05, pAdjustMethod="BH", parallel=FALSE)
res2 <- lfcShrink(dds, contrast=c("timepoint", "T2", "T0"), res=res2)

res3 <- results(dds, contrast=c("timepoint", "T3", "T0"), independentFiltering=TRUE, alpha=0.05, pAdjustMethod="BH", parallel=FALSE)
res3 <- lfcShrink(dds, contrast=c("timepoint", "T3", "T0"), res=res3)

If you want to do ANOVA between the timepoints, do:

dds <- DESeq(ddsMat, test="LRT", reduced=~SampleID + timepoint)
res <- results(dds)

----------------------------------------------

The remaining part that you're asking is not strictly covered in DESeq2. I would first transform the normalised counts via rlog() (set blind=FALSE) and then you have a few choices:

  • simply plot the distribution of your genes across each time-point as a bar, box-and-whisker, or line plot. You can then perform further tests on these
  • for each gene of interest, independently fit a separate model (e.g. via localised regression) in which your timepoint factor is ordered based on time (so, factor(timepoint, levels=c('T0', 'T1', 'T2', ...)). You can then simply compare these 'gene' models and obtain p-values that way. This would address the question: 'Does GeneX's path across the timepoint differ to that of GeneY?"

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin,

I actually did the pairwise comparisons separately, using a featureCounts matrix of just the two time-points in each comparison. Is it better practice to do them together?

When you suggest an ANOVA between time-points do you mean for me to test the variance between the groups to test for differences at the time-points?

I will try the Rlog() to plot the distribution and I will have to read into your suggestion for each gene of interest as I am not sure how I understand how to go about this.

Thanks so much for all your help and advice,

Lisa

ADD REPLY
0
Entering edit mode

I actually did the pairwise comparisons separately, using a featureCounts matrix of just the two time-points in each comparison. Is it better practice to do them together?

Not sure what you mean here(?) - the best way would be to do the pairwise comparisons as I have shown in my sample code.

The ANOVA, then, would find genes that are different between all time-points. This may not exactly be relevant, though!

ADD REPLY
1
Entering edit mode

Yeah, I am redoing it the way you have suggested now.

Sorry if I was unclear. I meant that I subsetted the data into t0.vs.t3, t1.vs.t3, and t2.vs.t3, ran feature counts on them all individually and then did the deseq2 on each subset. I see now that this was unnecessary.

Thanks for clarifying about the ANOVA. I think it would be worth having a look anyway.

Thanks again.

ADD REPLY
1
Entering edit mode

Ah, I see! Yes, doing it that way (sub-setting, et cetera) would introduce bias into the end-results. While saying this, you may see the same genes coming up as being statistically significant. Only those genes on the fringes of statistical significance may differ.

ADD REPLY
0
Entering edit mode

Hi Kevin:

I was guided here by google. I have made my object in that way but I have seen some errors. I have a time course RNA-seq with each timepoint (4 timepoints in total) has 2 replicates:

files <- c ("E11Rep1.txt", "E11Rep2.txt", "E14Rep1.txt", "E14Rep2.txt", "E18Rep1.txt", "E18Rep2.txt", "P56Rep1.txt", "P56Rep2.txt")
x <- readDGE(files, columns=c(1,3))
samplenames <- substring(colnames(x), 1, nchar(colnames(x)))
cts <- x$counts
samples <- data.frame(row.names=colnames(cts), timepoint = c("E11", "E11", "E14","E14", "E18","E18", "P56", "P56"), samplenames)
dds <- DESeqDataSetFromMatrix(countData=cts, colData=samples, design= ~  timepoint)

Everything is fine until the following step:

ddsTC <- DESeq(dds, test = "LRT", reduced = ~timepoint)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Error in nbinomLRT(object, full = full, reduced = reduced, quiet = quiet,  : 
  less than one degree of freedom, perhaps full and reduced models are not in the correct order

To mimic your advice fully, I did the following:

dds <- DESeqDataSetFromMatrix(countData=cts, colData=samples, design= ~  samplenames+timepoint)
converting counts to integer mode
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

What should I do to do the analysis in your comments? Thanks

ADD REPLY
1
Entering edit mode

Hey, if you just want to do the LRT, you would need:

dds <- DESeqDataSetFromMatrix(countData = cts, colData = samples, design = ~  timepoint)
dds <- DESeq(dds, test = 'LRT', reduced = ~ 1)

Let me know if that works.

Further explained here: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#likelihood-ratio-test

ADD REPLY
0
Entering edit mode

Thank you Kevin. Your advice works very well.

ADD REPLY
0
Entering edit mode
6.1 years ago
Vitis ★ 2.6k

Time course expression analysis could be much more involved than straight differential expression analyses. Sometimes simple MDS/PCA could reveal biological meanings from the data. And there are other time-series analysis inspired methods such as EBSeq-HMM.

https://academic.oup.com/bioinformatics/article/31/16/2614/320882

Here is a review of such methods:

https://www.sciencedirect.com/science/article/pii/S2001037015000392#bb0330

ADD COMMENT
0
Entering edit mode

Thank you for the links to the literature, I will have a read and see what else I can use to answer my questions. At the moment, the differential gene expression seems the most appropriate choice, as the ultimate ideal goal is to potential find biomarkers indicative of “disease” and “cure”. However, I appreciate that there are multiple ways to approach this.

ADD REPLY

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6