Hello everyone,
I am relatively new to microarray technology and gene expression analysis. I am using publicly available microarray data and I want to know if the data is quality enough for a larger analysis I am doing.
I read on the expression console user guide that the hybridization control probes ought to follow a pattern of increasing expression depending on the probeset. For example, the AFFX-r2-Ec-BioB probe should be expressed less than the AFFX-r2-Ec-BioC if the correct protocol is followed. However, when processing data for different Affymetrix HGU133 plus 2 arrays in RStudio I found that this pattern did not hold for one of the arrays but did for another. I included boxplots of the RMA normalized expression for two arrays below. (one was in base R and the other ggplot, my apologies)
How should I interpret this information as it relates to what's happening on the microarray? Is there a standard method of checking the quality of a microarray from publicly available CEL files? Any insight is appreciated.
Hi Will, please share the code that you used to access the data, and also the code to generate the plots. Note that I can infer that the study number is GSE16028; moreover, I can immediately see that the authors provided data 'normalised' via MAS5, not RMA.
Sure. I downloaded the CEL files from GEO, in the case of the first plot its GSE16028. Of note, the authors claim to have processed two batches of microarrays, so I subsetted them locally. However, the boxplots appear similarly across all microarrays for this particular study.
I used the getGEO function to pull the pheno data. Here's the code I used to look at these probes:
The error should have no impact on the structure needed for testing each microarray, which I show by previewing the table
Next, I ran a test to see if the probes were in the expected order, then stored arrays which didn't meet the criteria into a list which I can graph at will.
The result of the last two lines produces the graph:
I used the data from GSM401059 and ran it through MAS5 to get the following: