This is a follow up to my previous question.
I would like to implement the following steps given in the supplementary file of this study to reproduce the figure 1 displayed in the paper.
We used Affymetrix microarray data from a recent thorough analysis of the mouse and human transcriptomes [1]. We selected all 54 adult mouse non-cancer samples. The raw intensity data were transformed to normalized expression levels with the robust multi-array average (RMA) lowlevel algorithm [2] implemented in the BioConductor package [3]. We used standard settings, including perfect match (PM) only, model-based background and quantile normalization across experiments [4]. Similar results were obtained using the microarray analysis suite (MAS5) function followed by log-transformation to calculate expression levels (data not shown).
Mouse data is available on GEO with access ion number GSE1133. The data is available in different formats like CDF, CIF, GIN, PSI, SIF, PROBE, TAB, TXT. I am not sure which data format, containing the raw intensity data, has to be downloaded for implementing the procedure described above.
Many thanks for the response. Yes, the raw CEL files are available here. The figure 1 that I want to reproduce is available in this article. (Please find the link here) . Description of how the figure was created can be found in the supplementary. Also, figure one has been created using the data available from this study (Please find the link here).
In total 438 GSM files are listed . I am not sure how to distinguish Human and Mouse samples( I think this can be filtered using the platform id) ; cancerous and normal samples. Any suggestion on which package has to be used for RMA normalization illustrated here will be really helpful.
I think everything prefixed
MGM
is mouse, and the rest1B
/3A
is human. Simply click theGSM...
links, it will tell you the organism. Check if this pattern I suggested above holds true for the majority of the samples.Thank you. It is mentioned that GPL1073 GNF1M platform is for Mouse (GSM18584 to 18705) GPL1074 GNF1H is for Human. (18706 to 18863)
However, I couldn't find the platform id in the CEL files .
Go to the supplementary GSE1133_RAW.tar. Click on custom and it will lead you to all the .CEL files in this dataset. You can download whichever you need for your analysis.
Thank you. I am trying to normalize using the following code
Is this right? I am trying to normalize all samples(i.e GSM18584 to 18705) together