Hi there,
I'm currently doing an RNAseq analysis of a time course experiment with six timepoints, two genotype conditions and three replicates each. After STAR alignment and transcript quantification I used the DESeq2 package for count normalization and differential expression anlaysis just as decribed in the vignette with the following code:
dds <- DESeqDataSetFromMatrix(countData=count_matrix, colData=coldata, design= ~ genotype + timepoint + genotype:timepoint)
dds <- DESeq(dds, test="LRT", reduced = ~ timepoint+genotype)
res <- results(dds)
This all works well. However, if I look at the most signficant genes, I get time course plots such as this:
time course plot of significant gene
I don't doubt that there is a significant difference between these two conditions but I don't think they are biologically interesting as I seem to picking up noise here. Would you agree with this opinion? Is there an option to use DESeq2 to find genes with a non-parallel course or do I have to use a different package for that?
Thank you very much for your help! If you need any further information, please let me know.
Best, Moritz
Your problem here is that what you are asking DESeq to do is fit a linear regression on expression vs time for each genotype and then ask if the slope of that line is different betweeen genotypes. But a linear regression is clearly unsuitable here - gene expression is not linearly dependent on time, but has some more complex relationship.
I think there are more comlpex ways to solve this problem, but my first instict would be to make timepoint a factor rather than a numeric predictor.
You will unforunately loose a lot of power doing this - you'll be fitting a parameter for each timepoint for each genotype (the difference, for that genotype, from the first timepoint), rather than just one parameter for each genotype (the slope), but it should avoid the problem you identifiy.
The alternative is to transform the problem into one that is a linear regression, but I'm a bit out of my depth there.
DESeq doesn't do linear regression, it treats time as factor as far as I know, you will get differential expression if at any time point there will be a significant deviation from expected value.
Are you sure you can't provide continuouspredictors to DESeq, you definitely can to edgeR.
See for example: https://support.bioconductor.org/p/97262/, which is a helpful post all round for this question.
You're right, my bad. Didn't know DESeq can do that. In that case, indeed the OP should use time as factor or use a non-linear approach
Thank you both for your quick answers! Sorry for not being specific enough, I actually did use timepoint as a factor here and still got these results. Any other ideas or experiences? I'm currently also using the maSigPro and splineTimeR package to do an analysis with timepoint as a continuous variable but first results also don't look too promising.
Yes, it doesn't seem right. Maybe asking in the parallel bioconductor forum would be a better option (and delete this post if no one has answered it)