DESeq2 differential expression analysis for time course experiment: course of significant genes very similiar between tested conditions
0
1
Entering edit mode
5.4 years ago
cmgraef ▴ 10

Hi there,

I'm currently doing an RNAseq analysis of a time course experiment with six timepoints, two genotype conditions and three replicates each. After STAR alignment and transcript quantification I used the DESeq2 package for count normalization and differential expression anlaysis just as decribed in the vignette with the following code:

dds <- DESeqDataSetFromMatrix(countData=count_matrix, colData=coldata, design= ~ genotype + timepoint + genotype:timepoint)
dds <- DESeq(dds, test="LRT", reduced = ~ timepoint+genotype)
res <- results(dds)

This all works well. However, if I look at the most signficant genes, I get time course plots such as this:

time course plot of significant gene

I don't doubt that there is a significant difference between these two conditions but I don't think they are biologically interesting as I seem to picking up noise here. Would you agree with this opinion? Is there an option to use DESeq2 to find genes with a non-parallel course or do I have to use a different package for that?

Thank you very much for your help! If you need any further information, please let me know.

Best, Moritz

RNA-Seq DESeq2 time course experiment • 3.5k views
ADD COMMENT
2
Entering edit mode

Your problem here is that what you are asking DESeq to do is fit a linear regression on expression vs time for each genotype and then ask if the slope of that line is different betweeen genotypes. But a linear regression is clearly unsuitable here - gene expression is not linearly dependent on time, but has some more complex relationship.

I think there are more comlpex ways to solve this problem, but my first instict would be to make timepoint a factor rather than a numeric predictor.

You will unforunately loose a lot of power doing this - you'll be fitting a parameter for each timepoint for each genotype (the difference, for that genotype, from the first timepoint), rather than just one parameter for each genotype (the slope), but it should avoid the problem you identifiy.

The alternative is to transform the problem into one that is a linear regression, but I'm a bit out of my depth there.

ADD REPLY
0
Entering edit mode

DESeq doesn't do linear regression, it treats time as factor as far as I know, you will get differential expression if at any time point there will be a significant deviation from expected value.

ADD REPLY
0
Entering edit mode

Are you sure you can't provide continuouspredictors to DESeq, you definitely can to edgeR.

ADD REPLY
0
Entering edit mode

See for example: https://support.bioconductor.org/p/97262/, which is a helpful post all round for this question.

ADD REPLY
0
Entering edit mode

You're right, my bad. Didn't know DESeq can do that. In that case, indeed the OP should use time as factor or use a non-linear approach

ADD REPLY
0
Entering edit mode

Thank you both for your quick answers! Sorry for not being specific enough, I actually did use timepoint as a factor here and still got these results. Any other ideas or experiences? I'm currently also using the maSigPro and splineTimeR package to do an analysis with timepoint as a continuous variable but first results also don't look too promising.

ADD REPLY
1
Entering edit mode

Yes, it doesn't seem right. Maybe asking in the parallel bioconductor forum would be a better option (and delete this post if no one has answered it)

ADD REPLY

Login before adding your answer.

Traffic: 1746 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6