Hi All,
We have a specific gene mutation and we would like to learn how it is effective on Breast cancer.
So using the R, I get the mutation information from sequenced cases of TCGA Provisional and then stratified patients into two categories as Mutated & Wild Type. I downloaded the mRNA Expression z-Scores (RNA Seq V2 RSEM) from the cBioPortal website. I would like to look at the differentially expressed gene between these two groups but I have several questions :
The RNA seq data is Rsem.normalized, before I do any further analysis I transformed them into log2(rsem+1), that is correct right ?
For differential gene expression analysis what do you suggest me to use ? I cannot use DeSEQ2 or edgeR as they require raw counts as input.
I used limma package but I guess I get shows my data has some problem . Does it look ok or should I do something else ?
library(edgeR)
library(limma)
group = c( rep("Mut", 191), rep("WT", 660))
design <- model.matrix(~ 0 + group)
colnames(design) <- c("Mut", "WT")
y = TCGA_comb
par(mfrow=c(1,2))
v <- voom(y,design,plot = TRUE)
fit <- lmFit(v, design)
cont.matrix <- makeContrasts(PIK3CA_mutVSwt=Mut - WT,levels=design)
fit.cont <- contrasts.fit(fit, cont.matrix)
fit.cont <- eBayes(fit.cont)
plotSA(fit.cont)
summa.fit <- decideTests(fit.cont)
tab <- topTable(fit.cont, n=Inf, coef="PIK3CA_mutVSwt")
Would it be too superficial if I calculate Fold Change, p-value & FDR on my own?
a) Fold change: Take average of each gene per group and then Log2(B)-Log2(A) b) p-value: t.test command of R c) FDR: p.adjust(pvalue,method="fdr")
Many many thanks,
Gokce
could you please tell me how you did this : So using the R, I get the mutation information from sequenced cases of TCGA Provisional and then stratified patients into two categories as Mutated & Wild Type