Mutual exclusivity analysis on expression data
1
0
Entering edit mode
8.2 years ago
orzech_mag ▴ 230

Dear colleagues,

I would like to perform mutual exclusivity test on some expression data. My data are presented in form of contingency table, where columns are patients, rows are genes and "0"/"1" states if the expression of particular gene was favorable on disease recurrence in particular patient. Mutual exclusivity I mean any test that allow to check if genes and their expression rather act in concert or are mutually exclusive (if the effect of those genes is cumulative or not). I saw a lot of algorithms that are probable to perform such analyses, however they are all based on mutational data, not expression like mine (I mean algorithms like MEMo, RME, Dendrix and CoMEt). cBioPortal also performs it, however the analysis is restricted to data stored in cBioPortal and expression is normalized with z-score that doesn't in my field of interest.

I would appreciate any feedback, I don't mind using R/Python etc.

Thanks in advance!

Best regards

python RNA-Seq mutual-exclusivity R • 4.6k views
ADD COMMENT
2
Entering edit mode
8.2 years ago

You can start here : http://stats.stackexchange.com/questions/86318/clustering-a-binary-matrix If you don't have a lot of genes you could also try a fisher exact test between all pairs of genes. The odd ratio will give you the informations about exclusivity (or co-occurence) of the genes. Don't forget to correct the p-values for multi-testing.

ADD COMMENT
0
Entering edit mode

Thanks for answer. I'll read the content. Regarding the Fisher exact test I was thinking about it, however dependently on case I have ~20 genes and ~200-500 samples. I am not sure whether it is not too much for this test and secondly how to correct it for multi-testing.

ADD REPLY
0
Entering edit mode

So let's take an example with 20 genes and 200 samples. In R (sorry the code is not optimal at all..):

g <- 20
s <- 200
E <- round(matrix(runif(g*s), nrow=s, ncol=g)) # generate test matrix
colnames(E) <- 1:g # rename the columns
row.names(E) <- 1:s # rename the rows

# perform a fisher exact test for each pairs of gene and saves the oddsRatio and p-value
res <- NULL
for(i in 1:g){
    for(j in i:g){
        if(i!=j){
            f <- fisher.test(E[,i], E[,j]) 
            res <- rbind(res,cbind(geneA=colnames(E)[i],geneB=colnames(E)[j],oddsRatio=f$estimate,pvalue=f$p.value))
        }   
    }
}
# some formatting
res <- as.data.frame(res)
res$geneA <- factor(res$geneA,levels=as.character(1:g))
res$geneB <- factor(res$geneB,levels=as.character(1:g))
res$oddsRatio <- as.numeric(as.character(res$oddsRatio))
res$pvalue <- as.numeric(as.character(res$pvalue))
# use p.adjust to correct for multi testing using a FDR
res <- cbind(res,fdr=p.adjust(res$pvalue,"fdr"))
# change the FDR in labels for plotting
res$stars <- cut(res$fdr, breaks=c(-Inf, 0.001, 0.01, 0.05, Inf), label=c("***", "**", "*", ""))
# plot with ggplot 2
require(ggplot2)
require(cowplot) # not necessary but the plot is nicer
p <- ggplot(res, aes(geneA, geneB)) + geom_tile(aes(fill = log2(oddsRatio)),colour = "white") + scale_fill_gradient2(midpoint=1) + geom_text(aes(label=stars), color="black", size=5)
p

Here's the results

enter image description here

Hopes it will help

ADD REPLY
0
Entering edit mode

Thank your for this helpful code. I appreciate it :)

ADD REPLY
0
Entering edit mode

I've tried to do it on my data, as your code is actually what I was looking for, however I experienced some issue I cannot skip anyhow.

I tried to insert my own matrix instead of yours generated with code:

E <- round(matrix(runif(g*s), nrow=s, ncol=g)) # generate test matrix

and after I run colnames and row.names I got some errors like: Error:length of dimnames [2] not equal to array extent

Not sure how to fix this issue, as many suggestions from other forums didn't work it out. I would appreciate your further help.

ADD REPLY
0
Entering edit mode

could you print g and s value ?

ADD REPLY
0
Entering edit mode

Sure.

g = 12
s = 300

My data for this analysis: https://drive.google.com/open?id=0B0bjUaxpKQldZlIzeGlpM0lVUGc

Probably I make some mistake trying to insert those data as matrix. You used runif to generate some test data.

Thanks in advance!

ADD REPLY
0
Entering edit mode

how do you open your data in R. Could you post a subset of your data.

ADD REPLY
0
Entering edit mode

I read the data into matrix as follows:

my_data = matrix(c(1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1),
             nrow = 12, ncol = 5, byrow = TRUE)

I presented only partial data, as I cannot upload the whole subset here. It should be nrow = 12 and ncol = 300 with "0" and "1" distributed in particular way.

ADD REPLY
0
Entering edit mode

Ok. You told me that genes are in columns and sample in rows. So you should inverse nrow=300 and ncol=12. Also the for factor renaming you should do :

res$geneA <- factor(res$geneA,levels=colnames(E))
res$geneB <- factor(res$geneB,levels=colnames(E))
ADD REPLY
0
Entering edit mode

I am sorry for the misunderstanding. I did not precisely describe my data. Thank you for your help as the code is working now. I really appreciate it.

ADD REPLY

Login before adding your answer.

Traffic: 2595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6