Question

Rma Normalization And Sam Analysis

2

Entering edit mode

13.2 years ago

jevans ▴ 30

I'm in the process of analyzing some gene expression data from Affy UG_133 Plus 2.0 arrays for differential expression between two sample groups. After obtaining the .CEL files, I've used RMAExpress to RMA normalize the data and export it to a log .txt file, which is then plugged into Excel for SAM analysis using the two class unpaired response type. However, every time I run SAM, I get quite a steep looking plot, an unfeasibly high false discovery rate (usually 1) and only a handful of significant genes. I'm not sure whether the problem lies in my normalization or somewhere farther back in the experiment, but I'm fairly confident in the quality of the arrays so I suspect the former.

Would someone mind posting a quick, step-by-step walkthrough of a basic SAM analysis (from .CEL file to significant gene list) to guide me in the right direction? I'd really appreciate it. Cheers.

sam microarray • 3.6k views

ADD COMMENT • link updated 13.2 years ago by David Quigley 11k • written 13.2 years ago by jevans ▴ 30

3

Entering edit mode

That can be so many things, also most people here don't trust Excel for doing "real" bioinformatics analyses. I'd check if you loaded the data in the right format, are your cells formated as numeric? It could be biology as well, if there is no differential genexpression, then you couldn't detect it with the best arrays in the world. How many replicates do you have btw? How many genes are significant using a a different test? I suggest that you move to a more reproducible toolchain, e.g. R, or maybe netaffx. There is the siggenes package in R that does the same analysis as the Excel plugin, and the affy package handles RMA. I believe array experiments are still too expensive to waste them by using office tools.

ADD REPLY • link 13.2 years ago by Michael 56k

1

Entering edit mode

Many thanks for your response. I've begun using R as my primary resource for analysis and I'm finding my results far more reproducible and consistent with the expected biology.

ADD REPLY • link 13.2 years ago by jevans ▴ 30

1

Entering edit mode

look at the data with a different method, see if there is something wrong with it - too much variability etc

ADD REPLY • link 13.2 years ago by Istvan Albert 103k

score 2 · Answer 1 · 2012-06-14

The simplest explanation is that there isn't in fact any significant difference in your two conditions after accounting for multiple testing. You don't give N for either group or why you might think there is likely to be a difference, so we can't know what might be wrong. If your sample size is small and the effect size is small, there may be no statistically meaningful differences.

I assume from your approach that you are less familiar with R, so one sanity check would be to run a t test on the normalized data in excel using the ttest function. You can easily set this up for one row and then copy-paste for the rest. Select a two-tailed test, assume equal variance. Sort by the results and look for really low P values, if there aren't any, that's an indication that SAM is working and you have to look further back. Some things to do:

Check your group labels and your individual sample labels. Use what you know about the biology of the experiment to check for sane behavior from expected positive controls. E.g. if radiation response, check P21. If one is a tumor and the other normal, check KI67 and CCND1. Etc...