I am trying to carry out an analysis of genes differentially expressed in public data but I am in trouble. I followed the steps:
1) I downloaded the data in "raw count" format (RNA-seq)
2) I normalized the data by testing: normalizeVSN (limma), quantile and voom functions
3) I transformed the data into log2 and filtered. I have a resulting matrix with ~ 20k genes
4) The boxplot and pearson correlation show that the data is ok. When I do the PCA, the samples do not separate into groups.
5) I am using the limma to perform the analysis of DEGs (CTRL_4h - INF_4h; CTRL_12h - INF_12h; and CTRL_24h - INF_24h)
But the result seems strange to me. The authors identified a greater number of regulated genes. In my result CTRL_4h - INF_4h I find ~ 250, with many q-values repeated.
Can anyone evaluate the pipeline applied?
These are some of my scripts:
data <- read.delim("~/zikv_limma test/data.txt", row.names=1)
design <- read.delim("mydesign.txt", header=TRUE)
v <- voom(data, mydesign, plot=TRUE)
#or
vsn <- normalizeVsn(data)
fit <- lmFit(vsn_filter, mydesign)
contrast.matrix <- makeContrasts("CTRL_4h - INF_4h", "CTRL_12h - INF_12h", "CTRL_24h - INF_24h", levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
output <- topTreat(fit2, coef=1, number=Inf, adjust.method="BH")
topTable(fit2, coef=1, adjust="BH")
results <- decideTests(fit2)
thanks in advance!
Thank you so much for the link! It is an excellent tutorial and solved my problem!