Create the design matrix for limma differential expression analysis.
1
0
Entering edit mode
2.2 years ago

I want to perform differential analysis on RNA-seq data.

How do I create the design matrix for the limma package? I created the design dataframe using sample ID and class/phenotype as columns. However, my code raised No group or design set. Assuming all samples belong to one group. error.

Data processing:

dat <- read.csv("../input/mrna-clin-csv/mrna_clin_kipan.csv")
df <- dat[,-1]
rownames(df) <- dat[,1]
subtype <- df$subtype
df <- df[, -c(1:7)] # Drop the first 7 columns
df <- na.omit(df)
df <- df[order(rownames(df)), ]

The design dataframe, where the class/design is the subtype column.

design <- cbind(rownames(df), df["subtype"])
#provide column names
colnames(design) <- c("samples", "subtype")

Further data processing

df <- df[1:(length(df)-2)] # Drop the last two columns
mat <- data.matrix(df)
mat <- t(mat)
rownames(mat) <- sub("\\..*", "", rownames(mat)) # Keep substring before "." character
mat <- mat[!rownames(mat) %like% "X.*", ]

Limma:

#  create a DGEList object
dge <- DGEList(counts=mat) 

remove rows that consistently have zero or very low counts

keep <- filterByExpr(as.numeric(unlist(dge), design))

Traceback:

No group or design set. Assuming all samples belong to one group.

gene differential expression dge limma • 1.4k views
ADD COMMENT
1
Entering edit mode
2.2 years ago
Gordon Smyth ★ 7.7k

See A guide to creating design matrices for gene expression experiments for an extensive tutorial on creating design matrices.

Design matrices are created by the model.matrix function in R. The data.frame that you have called design in your code is actually just a sample information data.frame (called a targets frame in the limma documentation). You would create the design matrix from the sample information, but the sample information is not a design matrix itself.

The reason why you're getting a message about design not set is that your code

keep <- filterByExpr(as.numeric(unlist(dge), design))

uses the unlist() and as.numeric() functions to destroy both the DGEList and the design matrix, converting them both into one long numeric vector that doesn't have any meaning. Why would you do that?

ADD COMMENT

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6