Limma design question
0
0
Entering edit mode
7.9 years ago
Stane ▴ 90

Hello,

I have been working on microarrays using R and Limma for differential gene expression analysis. My current design is fairly simple as I am just using two class "control" and "treatment"

design <- model.matrix(~cell_class, data)

But the data also contains different cell lines, so I have been wondering if it would be better to use another design like so:

design <- model.matrix(~cell_class + cell_lines, data)

Both designs lead to very similar output, almost all DE genes are the same but with slight differences in fold change and FDR. I have been searching in limma documentation and a few papers without a clear answer so far.

R rna-seq • 1.7k views
ADD COMMENT
0
Entering edit mode

Let's wait for other comments but I think your second model is not a bad idea. But I would be cautious about using different cell lines. Are they close enough to be considered as biological replicates? (did you look at the PCA plots?) Do you have biological replicates for the cell_lines condition?

ADD REPLY
0
Entering edit mode

Each cell line contains both class and several replicates at least 2 per class, however I have one particular cell line that represent big part of the arrays. I have also 3 induction methods for the "treatment" class so should I update the design like so:

design <- model.matrix(~cell_class + cell_lines + treatment_induction, data)

As for the global PCA, it shows the two class separate quite nicely.

ADD REPLY
0
Entering edit mode

If would be more clear if you shared the design matrix. If I understand correctly treatment_induction is nested with cell_class (?). Wouldn't it be better to have just one variable containing levels e.g. untreated, treatment_A, treatment_B, etc.?

ADD REPLY
0
Entering edit mode

Thank you all for your comments. Regarding the result of the design matrix, I am afraid it will be too big to display here, it is a little over 300 rows. As you suggest, I could make a variable to combine the little differences but I am really interested in the cell class 'control vs (all treatments)'. Anyway, the limma toptable results are fairly similar but I just wanted to make sure I was not doing something silly in case I am publishing my results. I think I will just dig a little more carefully in the math involved and limma source code.

ADD REPLY

Login before adding your answer.

Traffic: 2888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6