I have 18 samples. These samples have 2 different factors with one factor(Infestation) having 2 levels (as control vs affected) & another factor(timepoint) having 3 levels(as 24h, 48h,96h) . Also the last factor (timepoint) have 3 biological replicates.
While performing:
counts <- read.delim("counts.csv", header = TRUE, row.names =1, sep = ",") dim(counts) [1] 27868 18 colData <-read.delim("colData.csv", header =TRUE, sep = ",", row.names =1) dds <- DESeqDataSetFromMatrix(countData = counts, colData = colData, design = ~ Infestation+timepoint) dds <- DESeq(dds) res <- results(dds) de_genes <- rownames(res)[which(res$padj < 0.5 & abs(res$log2FoldChange) > 1)][1:50]
This piece of code is getting very less genes( less than 10) using padj & log2fold values above mentioned.
BUT, modifying my design formula to include only one factor(i.e., Infestation) while disregarding timepoint factors gives me my specified number of 50 genes.
It is unclear what comparison you are trying to make with your differential expression analysis. This needs to be defined in order to setup your model design and your
results
function.In the code you shared you did not specify your contrast when running
results(dds)
:also if you are going to filter on fold change then you should apply
lfcshrinkage
.I recommend reviewing the following http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html