R Model Matrix has extra columns
1
0
Entering edit mode
6.5 years ago
tpham2654 • 0

I am trying to do differential expression analysis on some RNA microarray data. I was setting up my model matrix for limma from a csv file which has info on the samples, specifically how they were to be grouped (cre/flox status). Some example data is below:

geo_name,cre_lox,cell_type,treatment1,replicate_num
sample1,flox,c1,no,1
sample2,flox,c1,no,2
sample3,flox,c1,no,3
sample4,cre,c1,no,1
sample5,cre,c1,no,2
sample6,cre,c1,no,3
sample7,wt,c2,no,1
sample8,wt,c2,yes,1

The subset of the data I want (where cell_type=c1) has only "cre" and "flox" in the "cre_lox" column.
I selected for it using:

q1a_selected_col_data = col_data[(col_data$cell_type == 'c1'),]

However, when I used the function model.matrix(~q1a_selected_col_data$cre_lox) it results in a matrix like this:

(Intercept) q1a_selected_col_data$cre_loxflox q1a_selected_col_data$cre_loxwt
1         1                                 1                               0
2         1                                 1                               0
3         1                                 1                               0
4         1                                 0                               0
5         1                                 0                               0
6         1                                 0                               0

How did it "know" to add a column for "wt" status even though the data I passed to it does not have "wt" in it? Is there a way I can prevent things like this without having to modify the csv or remove columns from the model matrix by hand?

R RNA-Seq Limma • 2.6k views
ADD COMMENT
1
Entering edit mode
6.5 years ago
russhh 5.7k

your crelox column is stored as a factor. model.matrix will automatically put in a column for all non-reference levels of a factor variable, even if there isn't a sample with a given factor level.

To mitigate against problems like this, you could use droplevels on your original dataframe

ADD COMMENT
0
Entering edit mode

Thanks. I found that the design matrix is the inverse of what I want. Basically the 1 and 0 in the q1a_selected_col_data$cre_loxflox column should be switched since cre is the experimental group.

I tried model.matrix(~q1a_selected_col_data$cre_lox-1) after uing droplevels to invert it and now I get this:

 q1a_selected_col_data$cre_loxflox q1a_selected_col_data$cre_loxcre
1                                1                                0
2                                1                                0
3                                1                                0
4                                0                                1
5                                0                                1
6                                0                                1

I want the last column. Is there a way I can select the label model.matrix should mark as "1"?

ADD REPLY
0
Entering edit mode

relevel?

ADD REPLY
0
Entering edit mode

it doesn't really make any difference (although it might make things simpler to reason about) since you specify the experimental comparisons in your contrasts matrix, not your design matrix

ADD REPLY

Login before adding your answer.

Traffic: 1878 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6