I want to perform differential analysis on RNA-seq data.
How do I create the design
matrix for the limma
package? I created the design
dataframe using sample ID and class/phenotype as columns. However, my code raised No group or design set. Assuming all samples belong to one group.
error.
Data processing:
dat <- read.csv("../input/mrna-clin-csv/mrna_clin_kipan.csv")
df <- dat[,-1]
rownames(df) <- dat[,1]
subtype <- df$subtype
df <- df[, -c(1:7)] # Drop the first 7 columns
df <- na.omit(df)
df <- df[order(rownames(df)), ]
The design
dataframe, where the class/design is the subtype
column.
design <- cbind(rownames(df), df["subtype"])
#provide column names
colnames(design) <- c("samples", "subtype")
Further data processing
df <- df[1:(length(df)-2)] # Drop the last two columns
mat <- data.matrix(df)
mat <- t(mat)
rownames(mat) <- sub("\\..*", "", rownames(mat)) # Keep substring before "." character
mat <- mat[!rownames(mat) %like% "X.*", ]
Limma:
# create a DGEList object
dge <- DGEList(counts=mat)
remove rows that consistently have zero or very low counts
keep <- filterByExpr(as.numeric(unlist(dge), design))
Traceback:
No group or design set. Assuming all samples belong to one group.