Question

Differential express genes in imbalance groups

0

Entering edit mode

15 months ago

Chris ▴ 360

Hi Biostars,

I have microarray data of 3 groups: normal control (around 2000 subjects), early symptom (65 subs), late symptom (45 subs). Should I subsample the normal control group to get balance sample size? I use Limma:

    design <- model.matrix(~ 0 + group_vector)
 normalized_matrix <- normalizeBetweenArrays(numeric_matrix, method = "quantile")
    fit <- lmFit(normalized_matrix, design)
    contrast_matrix <- makeContrasts(NCvsES=group_vectorNC-group_vectorES, NCvsLS=group_vectorNC-group_vectorLS, ESvsLS=group_vectorES-group_vectorLS, levels=design)
    fit2 <- contrasts.fit(fit, contrast_matrix)
    fit2 <- eBayes(fit2)
    results <- topTable(fit2, adjust="BH", number=Inf)
    filtered_results <- results[results$adj.P.Val < 0.1, ]

Thank you so much!

Limma • 1.3k views

ADD COMMENT • link 14 months ago by Chris ▴ 360

score 3 · Accepted Answer · 2024-03-18

3

Entering edit mode

15 months ago

Gordon Smyth ★ 8.1k

Imbalanced group sizes are no problem. Just analyse the data as it is. No need to throw data away!

For a human study with a large number of subjects, I would suggest that you add sample quality weights:

w <- arrayWeights(normalized_matrix, design)
fit <- lmFit(normalized_matrix, design, weights=w)

You might also use robust=TRUE when running eBayes().

ADD COMMENT • link 15 months ago by Gordon Smyth ★ 8.1k

0

Entering edit mode

That sounds super interesting. I have never heard of this approach before. When you say 'sample quality weights' are you referring to differences in the group size or the actual quality of each sample? Would this be similar to including an offset term?

ADD REPLY • link 15 months ago by Chris Dean ▴ 420

1

Entering edit mode

I do mean the quality of each sample (as measured by residual variability in the linear model, in other words by consistency with other samples belonging to the same treatment group). Nothing to do with group sizes or offsets.

To learn more, see Chapter 14 of the limma User's Guide. Or type ?arrayWeights and look up the paper that is referenced. For RNA-seq, see ?voomWithQualityWeights or ?voomLmFit.

ADD REPLY • link 15 months ago by Gordon Smyth ★ 8.1k

0

Entering edit mode

Wow, the author of the tool reply. Thank you so much!

ADD REPLY • link 15 months ago by Chris ▴ 360

0

Entering edit mode

Hi Gordon, may I apply to pathways score from GSVA tool but not microarray or RNA-seq?

ADD REPLY • link 14 months ago by Chris ▴ 360