I want to run a differential expression analysis for my microarray dataset. Its 46 samples and I have a complex experimental design. The data was sequenced across 4 chips (12 samples per chip). The lab has given me access to these files:
- bcmatrix file per chip (these appear to be raw counts as all the rows are composed of integers)
- rpm.bcmatrix. (these should be normalized data. I have one file per chip)
- A .gene.chp file for each sample.
I tried to do an analysis using TAC, but I am not sure what is doing with each covariate, and I prefer to code myself everything for reproduction and better documentation.
So I have been trying to use R for my analysis. I am familiar with the limma
package, but my experience is with RNA-seq data no microarray. I have been reading the manual, but I could not find any way to deal with .chp files.
I have tried to read them with affxparser
package, but my R session crashes everytime.
I am now wondering if I could just use the rpm.bcmatrix files instead
So, my main question is: How do I load this data in R and kind of normalization/pre-processing steps should I be doing before testing for differential expressed genes? I assume for differential expression you just do the standard:
# Define the design
design <- model.matrix(~0 + Age + sex + RIN + Chip + individual + condition, data = metadata)
# Fit model
fit <- lmFit(eset, design) #With eset being the normalized expression data
fit <- eBayes(fit)
# Get significant genes
a <- topTable(fit, coef="conditionwound", number = Inf)