Hi guys,
I am looking at affymetrix microarray data from normal and cancer cells that I found on GEO. I want do stats on the expression level of just one gene of interest between normal and cancer cells. I have done some reading and I am just looking for confirmation that my approach is correct.
I am starting from CEL files (which I hopefully understand are just the raw intensity measurements for each probe).
I read these CEL files into R and know how to get a dataframe like object where each column is a different sample, each row is a probe, and each data entry is a raw intensity value
The way I understand it, next I would probably do RMA normalization which would normalize these raw intensities between samples and would output log2-transformed and normalized intensities.
I know that if I now wanted to find all DE genes I would go to something like limma at this point and use proper methods to control FDR, but my main question is if I wanted to just look at a single probe or two can I just do a simple t-test on the RMA normalized intensities?
Thank you for your time.
Hey, your Steps 1-3 seem okay. Also, yes, the CEL files contain the raw data fluorescent intensities. It is possible to generate a high density chip image to search for large-scale effects, as to which I allude here: A: Microarray image explanation
For step 4, it would honestly be just as quick to fit the linear model for all genes and generate a nominal and / or FDR-adjusted p-value that way.
When you mention 'probe', are you interested in just an exon of a particular gene or an entire gene?
If you want to share your code, please do. I am aware that microarray analysis can be difficult for those just starting in this area, in part because analysis pipelines differ for different chip types.
First, thank you for getting back.
The platform of the array is
GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array
I am mostly following this guide: https://wiki.bits.vib.be/index.php/Analyze_your_own_microarray_data_in_R/Bioconductor
I am interested in summarizing expression of one entire gene. I read about how to pick a probe for one gene and found the "catches" and the lists of approaches people use. My thought was to pick the probe with the highest average intensity to best represent the gene I am interested in (have not made my mind up about that though).
This is how I am getting the normalized intensity in R:
My thought was because I am interested in only one gene, doing FDR- adjusted way would deflate the p value for my gene of interest and didn't seem appropriate because I am not interested in multiple comparisons I am just interested in the one.
Is doing a t test using the values in
data.matrix
from my code sample for one probe "representative" probe picked base on some criteria a reasonable approach or is this a really bad idea? Is this something people even do?If not then I can look into the linear model approach more. The tutorial I am following has a limma example with a moderated t-test that I can look into.
Thanks for answering, first microarray analysis here.