Hello,
I have RNA-seq data with 6 replicates over 12 time points ... but the data is just two different 'experimental conditions' (two different sides of a plant in sunlight over 24 hours) with no control -- which I might have suggested be the plant in the absence of normal stimuli on each side, or some unrelated stem tissue from the plant.
I plan on using EdgeR to do a two factor analysis and testing for differential expression between the samples by using the first time point as a normalizing factor. So I will test:
(TimePoint_X_Condition1 - TimePoint_1_Condition1) - (TimePoint_X_Condition2 - TimePoint_1_Condition2)
for each time point X (2 - 12), where Condition and TimePoint are the two factors for the count based experiment.
Does this test make sense (sound, valid)? I reason that since I can't really treat either treatment as the control for the other in normal differential expression testing, it might make sense to test the differences in expression compared to baseline for each treatment. This assumes that TimePoint_1 is indeed a baseline for the treatments, which it really isn't... so I may end up testing each difference in expression between two points in the time series for one condition against each other such difference in the other condition:
(TimePoint_X_Condition_1 - TimePoint_Y_Condition_1) - (TimePoint_X_Condition_2 - TimePoint_Y_Conditon_2)
for each time point such that X ≠ Y. I guess I would then combine the differentially expressed genes for each unique test of a combination of time points and then count how many times the genes appear in the resulting combined list ... and thus how many of these tests showed differential expression.... Might then break them into sublists of "differentially expressed in greater than X" tests, where X is some arbitrary number of tests...
It just all seems wrong, but I cannot think of anything better to do without a control for the 'experimental conditions'.
What can I do to analyze two experimental conditions for differential expression without a true control?
You keep saying analyze and differential expression. What was the purpose of the biological experiment? That's what will tell you the relevant analysis. Maybe you're looking for genes that activate in both conditions relative to timepoint 1. Maybe you're looking for genes that are unique to one condition.
Perhaps I didn't make this clear. I am supposed to be looking for differentially expressed genes between conditions (each gene is in each condition) over the entire time series. It seems problematic then, for each condition to be 'experimental' while there is no control condition, since most packages that perform differential expression [testing]* (and indeed, statistical tests generally) expect there to be a control against which to test an experimental condition
My point is you have many factors and many choices when handling a time series. At any one timepoint you have two conditions and either could be named control for the sake of statistical software. Just rename the left side control, and the right side treatment, and 'upregulated genes' will translate to those higher on the right, and 'downregulated' will mean those higher on the left. It's just terminology.
My snip about having a hypothesis is because with the timeseries, you might be interested in only those genes showing a consistent trend, or maybe you could ignore the middle timepoints and just differentiate the first and last timepoints with L/R as a covariate. Then the first timepoint could be called control.
If your experimental setup would expect left and right to be equal at time=0, then they could be pooled as replicated control.
Ah, thanks for making me understand.
For a time-series RNA-Seq data, you can try STEM (Short Time-series Expression Miner). STEM can group genes based on their expression patterns along the time-course, and tells us whether a group of genes is significantly enriched for particular GO terms. It also can compare right side and left side on the time course.
Short Time-series Expression Miner (STEM)
http://www.cs.cmu.edu/~jernst/stem/
Jason Ernst Lab
http://www.biolchem.ucla.edu/labs/ernst/
If you want to find differentially expressed genes between the two conditions then just use a model of
~Condition*TimePoint
and use the p-values/fold-changes associated withCondition
. As karl.stamm said, it's the biological question that completely dictates the design.