New To Limma And Linear Modeling, Is Limma Appropriate For My Data?

2

Entering edit mode

13.4 years ago

Ric ▴ 20

I have a fairly complex microarray experiment for which I think limma might be a good approach, but I’m having some difficultly figuring out how to define the problem in limma and so was hoping to get some advice if limma is actually appropriate or if I should be looking at a different approach.

The data consist of microarrays probing ~20,000 genes.
The data come from 2 separate experiments that were batch corrected using CombatR.
Each patient was repeatedly measured at different timepoints: before and after treatment, and during an additional, intermediate treatment step (but only for one of the batches). This intermediate step is predicted to have an effect on the after treatment gene expression.
There are some missing samples so not all patients have the 2 or 3 timepoints which were originally measured.
Patients belong to 1 of 2 classifications prior to treatment.
Patients are classified as good or poor outcome (eventually we may assign multiple levels, but for now a binary classification)

So:

Batches: A, B
Timepoints: 1 (A & B), 2 (B), 3 (A & B)
Class: C, D
Outcome: G, P

In terms of the comparisons we’re most interested in:

1 vs 2, 1 vs 3 (maybe 2 vs 3)
C vs D at each timepoint (how they differ from each other and change over time)
A vs B @ timepoint 3 (as this defines the effect of additional treatment at timepoint 2), timepoint 1 could be a control for batch correction
G vs P @ any timepoint, but 1 being the most useful for prediction

I can also imagine other potential interactions that may be interesting:

Class vs Outcome
Batch vs Outcome
Batch vs Class

I do understand that batch and effect of treatment at timepoint 2 are confounded, but I don’t think there’s anything I can do about this other than propose a second validation study for any significant differences we find.

A few specific questions:

Should I define every possible contrast and interaction at the start in case any turn out to be of interest or is it acceptable to do a more broad exploratory pass then add contrasts and interactions for those factors that show significance?
Does doing this in limma correct for the multiple contrast/interaction comparisons (I understand I need to do a correction across the 20,000 genes, but I’m not clear on if/how to correct across contrasts/interactions)?
Can limma handle repeated measures and missing values (in one case all of timepoint 2 is missing from one of the batches)?

If limma is not appropriate, can you suggest a more appropriate method?

I would not be offended by any help in constructing the matrices… But I don’t expect anyone to do the work for me. If I know what I’m trying to do is feasible then I can figure it out.

Thanks!

limma microarray • 3.8k views

ADD COMMENT • link updated 13.4 years ago by Ying W ★ 4.3k • written 13.4 years ago by Ric ▴ 20

1

Entering edit mode

13.4 years ago

Ying W ★ 4.3k

bioconductor listserv might be more appropriate place to ask this https://stat.ethz.ch/mailman/listinfo/bioconductor

1> i would start simple and add more interactions as you explore further, adding more interactions might make it harder to explain results

2> i dont think you would need to correct for multiple interactions (how many interactions were you thinking of?!)

3> you might need to impute missing values using something like k-nearest neighbor (k-NN) method or just remove the patients with missing samples (unless that removes too many)

to make the matricies you might want to look into limma manual (lots of very helpful examples)

hope this helps!

ADD COMMENT • link 13.4 years ago by Ying W ★ 4.3k

Login before adding your answer.