Entering edit mode
15 months ago
Ann
▴
40
How to extract list of DE genes after EdgeR? With summary() I get number of Up and Down expressed genes. And I would like to extract the gene IDs for Up and Down. As I read from other manuals, the common way is to use filtering $table$logFC > 0 for up and $table$logFC < 0 for down regulated after applying TopTags() (see My pipeline). If I use this way with I get the same amount of DE genes as with summary().
But I did not understand why we use this way. Why we dont use filtering $table$logFC >1.5 for up and $table$logFC <1.5 for down if we use LogFC cutoff 1.5?
My pipeline:
# filtering
keep <- filterByExpr(y, design)
y_filtered <- y[keep, , keep.lib.sizes=FALSE]
# to estimate dispersion
y_disp_design <- estimateDisp(y_filtered, design = design, verbose=TRUE)
# Fit a quasi-likelihood negative binomial generalized log-linear model to count data
fit <- glmQLFit(y_disp_design, design, robust = TRUE)
# Conduct genewise statistical tests
LFC <- 1.5
tr.c1_2 <- glmTreat (fit, contrast = my.contrasts[, 'c1_2'], lfc = log2(LFC))
# Identify which genes are significantly differentially expressed
is.de.tr.c1_2 <- decideTestsDGE(tr.c1_2)
# View the amount of Up and Down genes
summary(is.de.tr.c1_2)
Down 3191
NotSig 13930
Up 1077
# Extracts the most differentially expressed genes
TT_tr.c1_2 <- topTags(tr.c1_2, Inf)
# View TT_tr.c1_2
TT_tr.c1_2
Coefficient: 1*stage_1 -1*stage_2
logFC unshrunk.logFC logCPM PValue FDR
TRINITY_DN583_c0_g1 -3.254507 -3.255309e+00 6.980710 1.123231e-11 2.044055e-07
TRINITY_DN1301_c4_g1 -5.887508 -5.921260e+00 3.852718 2.272249e-11 2.067519e-07
...
# extract Up genes
up_c1_2 <- row.names(TT_tr.c1_2$table[TT_tr.c1_2$table$FDR < 0.05 &
TT_tr.c1_2$table$logFC > 0, ])
length(up_c1_2)
1077