Hi all,
I have a set of normalised, log2 transformed gene expression data for two age groups. I performed a linear regression on each gene, and used an F test to determine whether the slope of the curve between the two age groups is different than zero, which in this case would indicate that there is an association between the expression signal and age.
This is a sample of my input:
Gene Age2 Age2 Age2 Age4 Age4 Age4 Age6 Age6 Age6
0610005C13Rik 0.50 0.88 0.14 0.48 0.096 0.66 0.14 0.13 0.48
0610006L08Rik 0.085 0.055 0.024 0.02 0.44 0.10 0.02 0.06 0.14
0610007P14Rik 0.84 0.94 1.07 0.79 0.96 0.99 0.95 0.80 0.86
0610009B22Rik 1.1 0.99 1.29 1.31 0.96 1.23 1.27 0.83 1.0
0610009L18Rik 0.83 0.91 0.99 0.62 1.09 0.62 1.49 0.78 1.18
This is a sample of my output:
(Intercept) age pval
0610005C13Rik -8.75673954644126e-17 1 6.22604270595892e-113
0610006L08Rik 0.160549044129198 -0.13300883967093 0.470334747352544
0610007P14Rik 0.929818139418018 -0.0345393343981819 0.794039611790303
0610009B22Rik 1.10956301739983 0.0143433761977127 0.952572869214549
My problem is that I don't understand the output, specifically, which genes are over-and under-expressed. Could someone explain (1) Why the first gene has a significant p value, but a "age" variable of 1 (this happens in all of my files, all of the differential output files; the first genes has a age variable = 1) and (2) How to tell from this example which genes are over- and under-expressed (although I can see from the P values that it won't be significant)?
Thanks
Could you post what method you used to arrive to these results? (actual code would help). You mention two age groups but I see 3 in your example data (Age2, Age4 and Age6).
Thanks, sorry I have multiple data sets, both with two and more than two age groups. I have switched to limma and posted the input file, the code and the output below, if you had any thoughts I'd appreciate it.