Question

Summarization step in oligo-nt microarray data analysis

0

Entering edit mode

8.4 years ago

arronar ▴ 290

Hi.

Currently a dataset of Affymetrix microarray experiment has been given to me, and I'm trying to analyze it. I read a lot of papers and tutorials out there to take it step by step.

Data are in a data.frame form like this :

| Ge/treat |   Control_1   | Control_2 | Control_3 | TreatA_1 | TreatA_2 | TreatA_3 | 
|----------|:-------------:|----------:|----------:|---------:|---------:|---------:|
| gene1    |       2.65    |    3.01   |   2.20    |  3.65    |   4.01   |   3.25   |
| gene2    |       1.54    |    1.27   |   2.01    |  2.65    |   3.11   |   2.90   |
| gene3    |       1.34    |    1.00   |   2.50    |  1.65    |   2.01   |   2.24   |

Values are random and there are more than three genes and more treatments ( TreatB_1,TreatB_2,TreatB_3 etc)

Until now i have done the normalization step by using Biocondactors normalize.quantiles() function.

For further analysis (to be able to create MA-plots, count differential expression) a summarization step is needed. As I understood for what I read, one common method to do this is the median polish method which is used at RMA algorithm ( but I don't have the CEL files to be able to run it).

If I am right, by applying that method you get a final matrix with one value/treatment and after that you can create MA plots and go further for the deferential expression step.

I search all over the net to find a way to summarize my table but couldn't find any solution. For example the medpolish function in R provides me the overall median and the residuals terms relative to the additive model behind the median polish but I am not quite sure on how to add this values to get the correct expression value for the gene for each array.

Can someone help me /give me a hint or example, on how to get a summarized matrix that will look like the above ?

Also if you think that I am approaching it in a wrong way, I would be thankful.

| Ge/treat | Control_Sum | TreatA_Sum | 
|----------|:-----------:|-----------:|
| gene1    |     2.45    |     3.31   |  
| gene2    |     1.24    |     1.47   | 
| gene3    |     1.54    |     2.00   |

Thank you.

microarray summariazation median polish • 2.5k views

ADD COMMENT • link updated 8.4 years ago by theobroma22 ★ 1.2k • written 8.4 years ago by arronar ▴ 290

0

Entering edit mode

I think your next step needs to be differential analysis, using limma. Compared each treatment group to control group to find differentially expressed genes in the topTables. Please read the user's guide of limma.

ADD REPLY • link 8.4 years ago by Benn 8.4k

0

Entering edit mode

So, no need for summarization ? What if I want to create an MA-plot for Control and TreatA for example ?

ADD REPLY • link 8.4 years ago by arronar ▴ 290

0

Entering edit mode

Take the A (AveExpr) and M (logFC) values from the topTable, and make an MA plot yourself.

ADD REPLY • link 8.4 years ago by Benn 8.4k

0

Entering edit mode

Thank you very much for answering. Do you think that these equations are correct for my example ?

$A = \log_2{\sqrt{(AVG.of.Controls)*(AVG.of.TreatmentA)}}\\\\ M= \log_2{\frac{AVG.of.Controls}{AVG.of.TreatmentA}}$

ADD REPLY • link 8.4 years ago by arronar ▴ 290

score 1 · Answer 1 · 2017-03-25

In R,

Log2 transform the summed data samples instead of using RMA.
Validate your pre and post normalization method using plots like histogram.
Use the oligo library from after the RMA normalization step to make your MA plots. As long as you have your annotated expression set object successfully created, you can call the MAplot() function.

If I have a matrix of 10,000 random expression values with mean equal to 100 and sd equal to 3, I can still do normalization to see if anything happens.

set.seed(325)
X = matrix(rnorm(10000, mean=100, sd=3), ncol = 1000, nrow = 100)
hist(X) #pre-normalization
X2 = log(X, 2) 
hist(X2) #post-normalization, and my values are now on the log2 scale!! 
X2 = data.frame(X2)
X2 = ExpressionSet(X2)
X2
#ExpressionSet (storageMode: lockedEnvironment)
#assayData: 100 features, 1000 samples 
#element names: exprs 
#protocolData: none
#phenoData: none
#featureData: none
#experimentData: use 'experimentData(object)'
#Annotation:  
library(oligo)
par(mfrow=c(3,1))
MAplot(X2, which=c(1:3), ylim=c(-7,7), 
cex = 3, main="vs pseudo median reference")