I'm in the process of analyzing some gene expression data from Affy UG_133 Plus 2.0 arrays for differential expression between two sample groups. After obtaining the .CEL files, I've used RMAExpress to RMA normalize the data and export it to a log .txt file, which is then plugged into Excel for SAM analysis using the two class unpaired response type. However, every time I run SAM, I get quite a steep looking plot, an unfeasibly high false discovery rate (usually 1) and only a handful of significant genes. I'm not sure whether the problem lies in my normalization or somewhere farther back in the experiment, but I'm fairly confident in the quality of the arrays so I suspect the former.
Would someone mind posting a quick, step-by-step walkthrough of a basic SAM analysis (from .CEL file to significant gene list) to guide me in the right direction? I'd really appreciate it. Cheers.
That can be so many things, also most people here don't trust Excel for doing "real" bioinformatics analyses. I'd check if you loaded the data in the right format, are your cells formated as numeric? It could be biology as well, if there is no differential genexpression, then you couldn't detect it with the best arrays in the world. How many replicates do you have btw? How many genes are significant using a a different test? I suggest that you move to a more reproducible toolchain, e.g. R, or maybe netaffx. There is the siggenes package in R that does the same analysis as the Excel plugin, and the affy package handles RMA. I believe array experiments are still too expensive to waste them by using office tools.
Many thanks for your response. I've begun using R as my primary resource for analysis and I'm finding my results far more reproducible and consistent with the expected biology.
look at the data with a different method, see if there is something wrong with it - too much variability etc