Entering edit mode
18 days ago
leranwangcs
▴
150
Hi,
I have been trying to design a Limma model for my analysis. I tried two different datasets, one is with 3% prevalence filtering which contains 6333 contigs, the other one is with 3% prevalence filtering, which only have 182 contigs left. My model design is:
model.design <- model.matrix(~ Dx.Status + Country + month + solid_food + genotype + feeding_type,metadata.clean)
model.fit <- voomLmFit(contig.abundance.table,model.design,plot=FALSE)
model.fit <- eBayes(model.fit)
sig.model.res <- topTable(model.fit,coef="DiseaseOn",number=Inf) %>% filter(adj.P.Val < 0.05)
However I got 0 significant contig.
And for the other datasets that contains 6333 contigs, I used the same model but got 6280 significant contigs. I checked the result table, 4000+ out of 6280 contigs have exactly the same p-adj values.
I wondered if there is anything wrong with my model design, and how to interpret these two outputs.
Thanks!
The code that you show cannot be correct, because the
topTable
coef
refers to a term that is not in the design matrix. The code as shown will give an error.Also, I don't know what you mean by "prevalence filtering". You would need to give more context.
As a more minor comment, you could set
p=0.05
intopTable
, then you wouldn't needfilter()
.thanks! There was a typo in the model and it was supposed to be:
By prevalence filtering, I mean I only keep those contigs that exist in more than 3% of the samples. This step caused a dramatic drop in the number of contigs (6333 contigs -> 182 contigs).
That alone does not necessarily indicate a problem. There could in reality be non differences, or noise/unwanted variation keeps you from detecting it.
Strongly suggesting a flaw in either code, dataset or normalization. Literally all genes being different is inherently incompatible with how OMICS works which is usually a relative readout. This often happens e.g. when people put an intercept-less design and then use wrong coefficients etc.
Ties in adj.P.Val are not worrying as the BH-correction often produces these. This assumes code and data were handled correctly by you.
Cannot tell, you don't show reproducible code as here, as Gordon Smyth says the code should error as the coefficient is not in the design.
Generally, summarizing my above comments I assume here we find a combination of non-proper code / use of models, normalization and dataset handling.
It is not clear to me how many samples (columns) are there in the
contig.abundance.table
. As the model is fitting 6 coefficients...keep those contigs that exist in more than 3% of the samples
may imply that there are a lot of zero in the table. What is the overall percentage (or count) of zero after the filtering?