Question

Differential express genes in imbalance groups

0

Entering edit mode

9 months ago

Chris ▴ 340

Hi Biostars,

I have microarray data of 3 groups: normal control (around 2000 subjects), early symptom (65 subs), late symptom (45 subs). Should I subsample the normal control group to get balance sample size? I use Limma:

    design <- model.matrix(~ 0 + group_vector)
 normalized_matrix <- normalizeBetweenArrays(numeric_matrix, method = "quantile")
    fit <- lmFit(normalized_matrix, design)
    contrast_matrix <- makeContrasts(NCvsES=group_vectorNC-group_vectorES, NCvsLS=group_vectorNC-group_vectorLS, ESvsLS=group_vectorES-group_vectorLS, levels=design)
    fit2 <- contrasts.fit(fit, contrast_matrix)
    fit2 <- eBayes(fit2)
    results <- topTable(fit2, adjust="BH", number=Inf)
    filtered_results <- results[results$adj.P.Val < 0.1, ]

Thank you so much!

Limma • 816 views

ADD COMMENT • link 9 months ago by Chris ▴ 340

score 3 · Accepted Answer · 2024-03-18

3

Entering edit mode

9 months ago

Gordon Smyth ★ 7.7k

Imbalanced group sizes are no problem. Just analyse the data as it is. No need to throw data away!

For a human study with a large number of subjects, I would suggest that you add sample quality weights:

w <- arrayWeights(normalized_matrix, design)
fit <- lmFit(normalized_matrix, design, weights=w)

You might also use robust=TRUE when running eBayes().

ADD COMMENT • link 9 months ago by Gordon Smyth ★ 7.7k

0

Entering edit mode

That sounds super interesting. I have never heard of this approach before. When you say 'sample quality weights' are you referring to differences in the group size or the actual quality of each sample? Would this be similar to including an offset term?

ADD REPLY • link 9 months ago by Chris Dean ▴ 420

1

Entering edit mode

I do mean the quality of each sample (as measured by residual variability in the linear model, in other words by consistency with other samples belonging to the same treatment group). Nothing to do with group sizes or offsets.

To learn more, see Chapter 14 of the limma User's Guide. Or type ?arrayWeights and look up the paper that is referenced. For RNA-seq, see ?voomWithQualityWeights or ?voomLmFit.

ADD REPLY • link 9 months ago by Gordon Smyth ★ 7.7k

0

Entering edit mode

Wow, the author of the tool reply. Thank you so much!

ADD REPLY • link 9 months ago by Chris ▴ 340

0

Entering edit mode

Hi Gordon, may I apply to pathways score from GSVA tool but not microarray or RNA-seq?

ADD REPLY • link 9 months ago by Chris ▴ 340