Hi All,
We have a specific gene mutation and we would like to learn how it is effective on Breast cancer.
So using the R, I get the mutation information from sequenced cases of TCGA Provisional and then stratified patients into two categories as Mutated & Wild Type. I downloaded the mRNA Expression z-Scores (RNA Seq V2 RSEM) from the cBioPortal website. I would like to look at the differentially expressed gene between these two groups but I have several questions :
The RNA seq data is Rsem.normalized, before I do any further analysis I transformed them into log2(rsem+1), that is correct right ?
For differential gene expression analysis what do you suggest me to use ? I cannot use DeSEQ2 or edgeR as they require raw counts as input.
I used limma package but I guess I get shows my data has some problem . Does it look ok or should I do something else ?
library(edgeR)
library(limma)
group = c( rep("Mut", 191), rep("WT", 660))
design <- model.matrix(~ 0 + group)
colnames(design) <- c("Mut", "WT")
y = TCGA_comb
par(mfrow=c(1,2))
v <- voom(y,design,plot = TRUE)
fit <- lmFit(v, design)
cont.matrix <- makeContrasts(PIK3CA_mutVSwt=Mut - WT,levels=design)
fit.cont <- contrasts.fit(fit, cont.matrix)
fit.cont <- eBayes(fit.cont)
plotSA(fit.cont)
summa.fit <- decideTests(fit.cont)
tab <- topTable(fit.cont, n=Inf, coef="PIK3CA_mutVSwt")
Would it be too superficial if I calculate Fold Change, p-value & FDR on my own?
a) Fold change: Take average of each gene per group and then Log2(B)-Log2(A) b) p-value: t.test command of R c) FDR: p.adjust(pvalue,method="fdr")
Many many thanks,
Gokce
Dear Kristoffer,
Thanks a lot for your answer.
I knew I would be more comfortable with the raw data however I was being forced to work on this normalized data. Maybe it is better to insist on working on the raw data again.
So to clarify, you are suggesting me to use voom+limma on the raw count data instead of edgeR & DESeq2, right ?
Yes i suggest voom-limma because when you have many replicates (which you do in the TCGA data) they perform similar in benchmark. The difference then becomes that instead of waiting hours on edgeR or DESeq2 you can wait seconds/minuts on limma allowing much greater flexibility.
Thanks a lot for your answer.