Question

RNA-Seq time course DESeq2 analysis

0

Entering edit mode

7.4 years ago

remi.willamme • 0

Hello,

I'm totally new in bioinformatics and have learned by myself with tuto's and guides during the last few months.

My experiment is a RNA-Seq profiling of diurnal cultures : I have 2 strains (WT and mutant) and 3 biological replicates for each, with sampling every 4h for 28h, which makes 8 time points (0 - 4 - 8 - 12 - 16 - 20 - 24 - 28). Thus making a total of 48 sets of data.

I already did the whole process of mapping the reads and counting them to end up with both raw counts and FPKM. I'm using the FPKM log2-fold change values with MapMan. And I have started to use the counts for a Differential Gene Expression analysis in R with DESeq2. My goal is to analyze how the transcriptome (17741 transcripts) is evolving during a day/night cycle but also to identify and analyze the differences between my 2 strains.

I already built the DESeqDataSets and DESeqResults with a complete design (design=~strain + time + strain:time) and with a reduced design : (design=~strain) or (design=~time) But that's where I'm a bit lost, despite reading any manual I could find. I don't get what I will get for results, what I should do with those values, how I can use them for graphics, how to chose which test is more relevant, ...

I probably didn't expose my situation clearly enough for you specialists, but any help would be so much appreciated!

Thanks already! Rémi

RNA-Seq DESeq2 time-course • 4.3k views

ADD COMMENT • link updated 7.4 years ago by Constantine ▴ 290 • written 7.4 years ago by remi.willamme • 0

0

Entering edit mode

I'm not 100% sure I completely understand your question, but it seems like you want to find significantly differentially expressed transcripts from timepoint to timepoint but also wild-type to mutant. Starting with the latter because that's much more straightforward, you can recover differentially expressed genes between each condition at each timepoint (eg at 0 hours, genes X,Y,Z are up/down in mutant compared to control with pvalues ...). Then you can plot the FC by FPKM or FC by -pvalue in an MA plot or Volcano plot respectively. For the timeseries, DE testing is a little trickier because there's a lot of comparisons you can make. A clustering strategy then might be more informative to identify groups of genes that move together through the time series.

ADD REPLY • link 7.4 years ago by Jake Warner ▴ 840

0

Entering edit mode

Hi Jacob,

Sorry for that late reaction and thanks for your comment! You perfectly understood my situation : i would like to find significantly differentially expressed transcripts over the time (48h in my case) but i also would like to compare wild-type results to mutant. I now get that I can't do both in the same analysis, can I? As you said, the most difficult will be for the timeseries. How should I do for clustering my transcripts?

Here's what I've done so far, maybe it will give you better information than when I try to describe it:

dds <- DESeq2::DESeqDataSetFromMatrix(countData = countsData, colData = colData, design = ~ strain + time + strain:time)

analysis <- DESeq2::DESeq(dds)

resultsNames(analysis)

[1] "Intercept" "strain_mutant_vs_complemented" "time_12h_vs_0h" "time_16h_vs_0h"
[5] "time_20h_vs_0h" "time_24h_vs_0h" "time_28h_vs_0h" "time_4h_vs_0h"
[9] "time_8h_vs_0h" "strainmutant.time12h" "strainmutant.time16h" "strainmutant.time20h"
[13] "strainmutant.time24h" "strainmutant.time28h" "strainmutant.time4h" "strainmutant.time8h"

head(analysis)

class: DESeqDataSet dim: 6 46 metadata(1): version assays(3): counts mu cooks rownames(6): Cre01.g000017.v5.5 Cre01.g000033.v5.5 ... Cre01.g000150.v5.5 Cre01.g000200.v5.5 rowData names(77): baseMean baseVar ... deviance maxCooks colnames(46): B0_1 B0_2 ... L28_2 L28_3 colData names(5): strain time strain_time phase sizeFactor

results <- results(analysis)

summary(results)

out of 17021 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 2203, 13% LFC < 0 (down) : 2320, 14% outliers [1] : 0, 0% low counts [2] : 1976, 12% (mean count < 2) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

ADD REPLY • link 7.3 years ago by remi.willamme • 0

0

Entering edit mode

I can think of a couple ways to do this but I'm not sure which is best. The first is to make the lists of DE genes for your comparisons in the WT data and mutant data. The comparison are up to you: as suggested by Constantine they can be all relative to 0 (0:4, 0:8, and so on) or between timepoints (0:4, 4:8 and so on). I also usually cutoff my DE lists (abs(FC) > 2, p-value < 0.05). So your DE lists for the WT and mutant would be all genes that are DE, in any one of the comparisons and passing the filter. Then, you can compare these two lists with a Venn diagram to see the genes that are DE in both datasets/ unique to one dataset etc.

The second way is to go the clustering route. There's a lot of clustering methods. For example using hierarchical clustering and cutting the dendrogram will usually recover time-profiles well enough. I wrote a tutorial for that here. Another option is using fuzzy means clustering with Mfuzz. I would cluster the data in both datasets independently and then compare cluster assignments. You could use a contingency table and fischers exact test to checkout the overlap.

ADD REPLY • link 7.2 years ago by Jake Warner ▴ 840

score 0 · Answer 1 · 2017-07-19

I recommend you do a differential expression analysis for each timepoint vs 0 in each condition. That is:

4h - 0 (WT)
8h - 0 (WT)
12h - 0 (WT)
.
.
.
28h - 0 (mutant)

and see if you find any statistical and biological significant changes (Normalise your raw counts for that -- RPKM/ FPKM values are not very meaningful for statistics).

Now let's say the 28h is where you get most statistically significant transcriptomic changes. You can use those genes and plot a heatmap across the different timepoints for each condition and also observe whether the mutant displays different patterns.

So long story short, looking directly at 17741 transcripts (using a heatmap) over the course of time won't give you much info. You need to narrow down the list to the genes that are affected over the day/night cycle.