Hi,
I'm investigating the potential effect of several mutations on gene expression. I've RNA-seq from ~30 tumors and 4 control samples. I screened for these 30 tumors about 20 genes for mutations. I translated these mutation data into a binary matrix (rows = samples, columns = genes ; 0 = not mutated, 1 = mutated).
For the RNA-Seq I aligned with STAR and counted the number of reads per gene with featurecounts. Then I merged everything into a big read count matrix (column = sample ; rows=gene).
Now I want to detect genes which expression is affected by these mutations. So I used limma as follow:
The design matrix harbors a column for each gene (~20), and an additional column indicating if the samples is a tumor or not. I also add an offset column. For example :
mut dataframe :
offset geneA geneB geneC isTumor
sample1 1 0 1 1 1
sample2 1 1 1 0 1
sample3 1 0 0 1 0
So :
design <- mut
design <- design[,colSums(design)>2] # remove gene that is mutated in less than 3 samples
design <- cbind(offset=1,design) # add offset
# egdeR
dge <- DGEList(counts=counts)
dge <- calcNormFactors(dge)
cpm <- cpm(dge,log=T)
keep.exprs <- rowSums(cpm>1)>=3 # min 3 samples with CPM > 1
dge <- dge[keep.exprs,]
# limma voom
geneExpr <- voom(dge,design)
glm = lmFit(geneExpr$E, design = design)
glm = eBayes(glm)
testResults <- decideTests(glm, method="global",adjust.method="BH", p.value=0.05)[,-1]
Now in testResults I've a matrix containing for each gene and each covariate (the 20 screened genes for mutation) if it's up (=1) or down regulated (=-1) ; (no significant change = 0). Is that correct for you ?
Thanks
edit : I posted the question to bioconductor forum as adviced by Goutham Atla. https://support.bioconductor.org/p/86092/
You would get much quicker response on https://support.bioconductor.org for questions related to statistics and bioconductor packages. Most of the authors are very active there.