Question

Limma model gives weird results

0

Entering edit mode

18 days ago

leranwangcs ▴ 150

Hi,

I have been trying to design a Limma model for my analysis. I tried two different datasets, one is with 3% prevalence filtering which contains 6333 contigs, the other one is with 3% prevalence filtering, which only have 182 contigs left. My model design is:

model.design <- model.matrix(~ Dx.Status + Country + month + solid_food + genotype + feeding_type,metadata.clean)
model.fit <- voomLmFit(contig.abundance.table,model.design,plot=FALSE)
model.fit <- eBayes(model.fit)
sig.model.res <- topTable(model.fit,coef="DiseaseOn",number=Inf) %>% filter(adj.P.Val < 0.05)

However I got 0 significant contig.

And for the other datasets that contains 6333 contigs, I used the same model but got 6280 significant contigs. I checked the result table, 4000+ out of 6280 contigs have exactly the same p-adj values.

I wondered if there is anything wrong with my model design, and how to interpret these two outputs.

Thanks!

model multivariate Limma • 613 views

ADD COMMENT • link updated 16 days ago by SamGG ▴ 100 • written 18 days ago by leranwangcs ▴ 150

1

Entering edit mode

The code that you show cannot be correct, because the topTable coef refers to a term that is not in the design matrix. The code as shown will give an error.

Also, I don't know what you mean by "prevalence filtering". You would need to give more context.

As a more minor comment, you could set p=0.05 in topTable, then you wouldn't need filter().

ADD REPLY • link 18 days ago by Gordon Smyth ★ 8.0k

0

Entering edit mode

thanks! There was a typo in the model and it was supposed to be:

model.design <- model.matrix(~ Dx.Status + Country + month + solid_food + genotype + feeding_type,metadata.clean)
model.fit <- voomLmFit(contig.abundance.table,model.design,plot=FALSE)
model.fit <- eBayes(model.fit)
sig.model.res <- topTable(model.fit,coef="Dx.StatusOn",number=Inf) %>% filter(adj.P.Val < 0.05)

By prevalence filtering, I mean I only keep those contigs that exist in more than 3% of the samples. This step caused a dramatic drop in the number of contigs (6333 contigs -> 182 contigs).

ADD REPLY • link 16 days ago by leranwangcs ▴ 150

1

Entering edit mode

However I got 0 significant contig.

That alone does not necessarily indicate a problem. There could in reality be non differences, or noise/unwanted variation keeps you from detecting it.

And for the other datasets that contains 6333 contigs, I used the same model but got 6280 significant contigs.

Strongly suggesting a flaw in either code, dataset or normalization. Literally all genes being different is inherently incompatible with how OMICS works which is usually a relative readout. This often happens e.g. when people put an intercept-less design and then use wrong coefficients etc.

I checked the result table, 4000+ out of 6280 contigs have exactly the same p-adj values.

Ties in adj.P.Val are not worrying as the BH-correction often produces these. This assumes code and data were handled correctly by you.

I wondered if there is anything wrong with my model design, and how to interpret these two outputs.

Cannot tell, you don't show reproducible code as here, as Gordon Smyth says the code should error as the coefficient is not in the design.

Generally, summarizing my above comments I assume here we find a combination of non-proper code / use of models, normalization and dataset handling.

ADD REPLY • link 17 days ago by ATpoint 88k

1

Entering edit mode

It is not clear to me how many samples (columns) are there in the contig.abundance.table. As the model is fitting 6 coefficients...

keep those contigs that exist in more than 3% of the samples may imply that there are a lot of zero in the table. What is the overall percentage (or count) of zero after the filtering?

ADD REPLY • link 16 days ago by SamGG ▴ 100