[R] How to make Contrast for Differential Expressed with more classes than 2 ?
1
0
Entering edit mode
2.5 years ago
ali ▴ 20

Hi everyone, I already did an analysis over samples with Tumor or Healthy. For it I did :

# create design factors [pan-cancer]
pan.cancer <- factor(sample.cond$condition) #Factor for HC/Tumor

# build the design matrix based on the group
design.pan.cancer <- model.matrix(~ 0 + pan.cancer)

# compute common, trend, tagwise dispersion
y <- estimateDisp(y,design = design.pan.cancer ,robust = TRUE)

# fit the negative binomial GLM for each tag 
fit <- glmFit(y, design.pan.cancer)

# build the contrast 
contrast <- makeContrasts(Tumor - HC, levels=colnames(design.pan.cancer))

# Likelihood ratio test
lrt.pan.cancer<- glmLRT(fit, contrast = contrast)

Everything works fine, but now I have to do the same but for multiclasses. Because further I need the significant genes in order to do SVM Classification. How can I make the contrast matrix having 7 classes (I need all vs all) ? The classes now are not Tumor and HC(Healthy Control) but now they are :

Breast,CRC,GBM,Hepatobiliary,Lung,Pancreas,HC

I made the design but I dont know how to make the contrast :

# create design factors [multi-class]
multi.class <- factor(sample.cond$class) #Factor for HC/Breast/GBM/CRC/ Hepatobiliary,Lung,Pancreas,HC

# build the design matrix based on the group
design.multi.class <- model.matrix(~ 0 + multi.class)

# compute common, trend, tagwise dispersion
y.multi.class <- estimateDisp(y.multi.class,design = design.multi.class ,robust = TRUE)

# fit the negative binomial GLM for each tag 
fit.multi.class <- glmFit(y.multi.class, design.multi.class)

# build the contrast 
??
de gene dge R edgeR • 2.6k views
ADD COMMENT
0
Entering edit mode

All vs all. I have 7 classes so a factor of 7. And I need the comparison between all of them. Doing it manually is too much so I was wondering how to do it, im pretty sure there is a way. I hope at least. Example:

Breast Vs CRC , Breast Vs GBM, Breast Vs Hepabiliary .......... until all is compared to all. Is it possible?

ADD REPLY
0
Entering edit mode

I actually just want an ANOVA on the 7 classes , this is what the book says :

To determine the specific input gene lists for the classifying algorithms we performed ANOVA testing for differences

They use edgeR but dont tell the passages so im trying to interpret it. I have a count matrix with 285 samples and 16000 genes. The samples are divided in 7 groups that are the 7 classes (Brca,Crc,GBM,HBC,Lung,PAAD). So what should I do? I just followed edgeR but I dont know how to compare them equally with the fit.

ADD REPLY
6
Entering edit mode
2.5 years ago
Gordon Smyth ★ 7.7k

I assume you're trying to find genes that are DE between any of the cancer types. I would use

design.multi.class <- model.matrix(~ multi.class)
y.multi.class <- estimateDisp(y.multi.class, design.multi.class)
fit.multi.class <- glmQLFit(y.multi.class, design.multi.class, robust = TRUE)
test.multi.class <- glmQLFTest(y.multi.class, design.multi.class, coef=2:7)
topTags(test.multi.class)

This conducts F-tests (analogous to anova) for differences between any of the 7 cancer classes. By skipping the use of 0+ in the design matrix formula, we code contrasts between cancer classes directly into the design matrix coefficients. The use of coef=2:7 tells edgeR that you want to consider all possible contrasts between the classes. You would not use coef=1 of course because that is the intercept.

Correction

The second last code line should have been:

test.multi.class <- glmQLFTest(y.multi.class, coef=2:7)
ADD COMMENT
0
Entering edit mode

First of all thx for the reply. It tells me :

contrast vector of wrong length, should be equal to number of coefficients in the linear model.

I think, is because we cant use coef without defining a contrast. And I need HC too in the comparison that is why I added (~0+ multi.class).

ADD REPLY
0
Entering edit mode

contrast vector of wrong length

This shows that you have not followed my code. This error message will appear only if you specify the contrast argument, whereas I advised you to specify the coef argument

we cant use coef without defining a contrast.

I am the edgeR senior author, and I assure you that you can.

I need HC too in the comparison

The code I suggested treats all the groups equally, including HC.

Would you at least consider trying the code I suggested? It is hard to help you if you don't.

ADD REPLY
0
Entering edit mode

I'm not saying that you are wrong(Im definitely the new one here, no discussion about it) but I actually did copy and paste even before, but here I show the result so you can see if Im doing something wrong :

 > multi.class=relevel(multi.class, ref = "HC")
 > summary(multi.class)
           HC        Breast           CRC           GBM Hepatobiliary          Lung      Pancreas 
           55            39            42            40            14            60            35 
 > design.multi.class <- model.matrix(~ multi.class)
 > head(design.multi.class)
  (Intercept) multi.classBreast multi.classCRC multi.classGBM multi.classHepatobiliary multi.classLung multi.classPancreas
1           1                 1              0              0                        0               0                   0
2           1                 1              0              0                        0               0                   0
3           1                 1              0              0                        0               0                   0
4           1                 1              0              0                        0               0                   0
5           1                 1              0              0                        0               0                   0
6           1                 1              0              0                        0               0                   0
>  y.multi.class <- estimateDisp(y.multi.class, design.multi.class)
>  fit.multi.class <- glmQLFit(y.multi.class, design.multi.class, robust = TRUE)
>  test.multi.class <- glmQLFTest(fit.multi.class, design.multi.class, coef=2:7)
Error in glmLRT(glmfit, coef = coef, contrast = contrast) : 
  contrast vector of wrong length, should be equal to number of coefficients in the linear model.
> topTags(test.multi.class)
Error in topTags(test.multi.class) : object 'test.multi.class' not found
ADD REPLY
1
Entering edit mode

My apologies, there was a typo in my glmQLFTest call. It should simply be:

test.multi.class <- glmQLFTest(fit.multi.class, coef=2:7)

Then all will work correctly.

Of course there was no need to pass the design matrix to the test function! R was interpretting the design matrix as the contrast matrix.

ADD REPLY
1
Entering edit mode

Here's a reproducible code example. You can ignore all the logFC columns in the final topTags table and only pay attention to the F-statistics and p-values. There is no significance to the particular contrasts represented by the coefficients here. Any set of 6 linearly independent contrasts will give the same F-statistics and p-values.

> library(edgeR)
> multi.class <- rep(c("HC","Breast","CRC","GBM","Hepatobiliary","Lung","Pancreas"),c(55,39,42,40,14,60,35))
> multi.class <- factor(multi.class)
> multi.class <- relevel(multi.class, ref = "HC")
> design.multi.class <- model.matrix(~ multi.class)
> 
> nsamples <- length(multi.class)
> ngenes <- 1000
> counts <- matrix(rnbinom(ngenes*nsamples,mu=50,size=20),ngenes,nsamples)
> y.multi.class <- DGEList(counts)
> 
> y.multi.class <- estimateDisp(y.multi.class, design.multi.class)
> fit.multi.class <- glmQLFit(y.multi.class, design.multi.class, robust = TRUE)
> test.multi.class <- glmQLFTest(fit.multi.class, coef=2:7)
> topTags(test.multi.class)
Coefficient:  multi.classBreast multi.classCRC multi.classGBM multi.classHepatobiliary multi.classLung multi.classPancreas 
    logFC.multi.classBreast logFC.multi.classCRC logFC.multi.classGBM logFC.multi.classHepatobiliary logFC.multi.classLung
870                 0.25968             8.85e-02               0.0385                        0.29865               0.07358
72                  0.18822            -9.56e-02               0.0112                        0.08829              -0.10582
100                -0.05362            -1.09e-01               0.1387                        0.17006               0.00789
887                -0.27658            -1.42e-01              -0.0546                        0.00812              -0.00873
958                 0.13107             2.28e-01               0.2945                        0.09153               0.16085
943                 0.13415             2.79e-01               0.0409                        0.13947               0.19345
622                 0.00626            -1.32e-02              -0.1625                       -0.04043              -0.17993
689                 0.22068             5.31e-02               0.1414                        0.04257               0.03656
226                 0.10508             2.33e-02              -0.1711                       -0.04485              -0.01469
910                -0.15098             9.39e-05              -0.0429                       -0.29008              -0.21397
    logFC.multi.classPancreas logCPM    F  PValue   FDR
870                  -0.02302  10.01 3.19 0.00395 0.912
72                    0.05619  10.02 3.08 0.00515 0.912
100                   0.16227  10.04 2.94 0.00723 0.912
887                   0.00872  10.00 2.91 0.00777 0.912
958                   0.22255  10.00 2.86 0.00871 0.912
943                   0.07766  10.01 2.85 0.00888 0.912
622                  -0.23621   9.98 2.83 0.00930 0.912
689                  -0.10954   9.99 2.82 0.00960 0.912
226                   0.15755  10.00 2.81 0.00992 0.912
910                  -0.09149  10.01 2.78 0.01063 0.912
ADD REPLY
0
Entering edit mode

Ok now it works ! , yes I did not notice that too. My question now is , does this contrast says "Specific Tumor against HC (for each specific tumor)" ? Because in this case , is it actually multi-class? Or is just specific Tumor against HC for each Tumor. Like I think this is the right approach but I'm just scared that they mean comparison even between Tumors (but it would be exponential).

ADD REPLY
1
Entering edit mode

It compares everything to everything. You could relevel the factor to use one of the cancers as the reference and you will get identical results. As I said, any set of 6 linearly independent contrasts will give the same F-statistics and p-values. There are only 6 degrees of freedom between 7 groups, so any set of 6 contrasts allows all other possible contrasts to be determined.

ADD REPLY
0
Entering edit mode

Really Clear, as first I up vote the answear since it solved my question ! And if you are willing to answer this following curiosity I would be happy otherwise thank you anyways ! The question is :

I have to do a Multi Class Classification. In order to classify a random sample to one of this 7 classes (HC , Brca, CRC, GBM, HBC, Lung, Pancreas) , I need to choose the best genes that gives me the most differences between the conditions (training) . Would this thing , choose a certain (rownames(topTags)) above a P-Value Threshold, give me the best genes for this goal? Or you think this one just compare 6 to one so I can not use it for classify a random sample as one of the 7 classes?

ADD REPLY

Login before adding your answer.

Traffic: 2072 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6