I'm trying to reproduce the RMA preprocessing algorithm. In the summarization step, normalized expression values of individual probes are combined together to get the overall expression values of the genes they represent.
My first step would be to take from the matrix containing individual probes (that results from the normalization step) all submatrices, where each submatrix contains a probe set relating to one common gene. Is there a specific package that I can use in order to produce these submatrices?
For reference, I'm working on the following workflow An end to end workflow for differential gene expression using Affymetrix microarrays. So basically I'm trying to reproduce all what is done in this line of code palmieri_eset_norm <- oligo::rma(raw_data, target = "core")
(Step 9).
Thank you for you reply.
What I meant is that I want to take from the big matrix containing the normalized expression values of all probes, all possible k submatrices (k= number of genes represented by the chip) where each submatrix contains all probes relating to one same gene. By "probeset" I meant the set of probes related to the same gene. AT this point I want to summarize on the gene level, not the exon level.
So basically I have the following steps in mind for the summarization stage: 1. Create the k submatrices 2. Apply the Median Polish method on each submatrix to estimate the corresponding gene expression values in each microarray 3. Combine the gene expression values in a bigger matrix, which would be the final output
I apologize if I'm mixing up the terms or if I'm not being very clear; I still don't have a deep understanding of the subject.
I see, but, why not use the standard RMA approach and then summarise to gene (if needed) after RMA?; or just follow the tutorial to which you linked?
In Part 8 ( https://www.bioconductor.org/packages/devel/workflows/vignettes/maEndToEnd/inst/doc/MA-Workflow.html#8_relative_log_expression_data_quality_analysis ), they perform background correction and log2 transformation, but not quantile normalisation, it seems, after which they plot the medians in an attempt (I suppose) to show how median summarisation functions.
The RMA approach actually already performs median polish for the summarisation step; although, again, on 'Exon' arrays, this summarisation is to the Exon level, while, on other 'Gene' arrays, the summarisation is to gene level.
I guess I did not make my point clear, I apologize for it. So we are assigned to write the code of the RMA function by ourselves from scratch, i.e. to reproduce the function. I was assigned to write the code of the summarization step.
Yes I know that, but again, I myself have to write my own code that performs this step. And that's where I'm coming from.
From the info I collected about this step, I guess it proceeds as follows: 1. log2 transformation of the quantile normalized expression values 2. Median Polishing of probesets (again, sorry if I'm mixing up the terms) 3. Returning an expression matrix similar to the initial one of probe intensities, but here it would contain genes or exons expression values rather than probes. BUT I couldn't find any resource that explains what actually happens internally in the code.
I guess the log2 transformation is easy. However, I'm not sure how to proceed with step 2 in how to produce the submatrices of probesets on which I will then apply the median polish. Is there a function in a certain package that can produce these submatrices?
Also, am I even getting it right? Sorry if I'm being naive but I'm only a couple of weeks into this topic and it's still not very clear at this point.