EDIT 2011-06-21: In the original version of this question, I misstated that the Cy3 channel contained a measure of a common reference from pooled controls. Instead, the Cy3 channels are actually a measure of the non-exposed culture from the same animal.
This question is very similar to an unanswered question asked on the Bioconductor mailing list. (The only response I found was unhelpful.) I will re-cast it in the particular problem I'm trying to solve.
GSE13465 contains experiments on two organisms:
- human
- rat
I only care about data from the rat cultures. (The human data is present due to submitter error.)
Three biological replicates were constructed and measured.
For each biological replicate, the rat cultures took place in two media types:
- rat hepatocytes in normal media
- rat hepatocytes in modified media
Each media condition was paired with an acetaminophen (APAP) treatment condition:
- no exposure (0 mM APAP), the control
- exposure to 5 mM APAP
- exposure to 10 mM APAP
Expression levels were measured using the two-channel Agilent 011868 Rat Oligo Microarray G4130A (GPL890). All gene expression measurements were performed with the labels as such:
- Channel 1 (Cy5): 0 mM APAP (control) or 5 or 10 mM APAP (treatment)
- Channel 2 (Cy3): 0 mM APAP (control)
I would like to perform the following contrasts:
- 5 mM APAP vs. control, standard media
- 10 mM APAP vs. control, standard media
- 5 mM APAP vs. control, modified media
- 10 mM APAP vs. control, modified media
Using GEOquery, I obtained the gene expression data for the GEO Series
> library(GEOquery)
> library(limma)
> gse13465 <- getGEO("GSE13465", GSEMatrix = TRUE)
Found 2 file(s)
GSE13465-GPL887_series_matrix.txt.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1254k 100 1254k 0 0 302k 0 0:00:04 0:00:04 --:--:-- 315k
File stored at:
/tmp/Rtmp2VCw8p/GPL887.soft
GSE13465-GPL890_series_matrix.txt.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1604k 100 1604k 0 0 452k 0 0:00:03 0:00:03 --:--:-- 474k
File stored at:
/tmp/Rtmp2VCw8p/GPL890.soft
> ratdata <- gse13465[[2]] # The rat data is in the second experiment
> ratdata
ExpressionSet (storageMode: lockedEnvironment)
assayData: 20501 features, 18 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM339437 GSM339438 ... GSM339454 (18 total)
varLabels: title geo_accession ... data_row_count (42 total)
varMetadata: labelDescription
featureData
featureNames: 3 4 ... 22572 (20501 total)
fvarLabels: ID COL ... SEQUENCE (20 total)
fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL890
I'm uncertain how to procede from here, however. The expression values that are in the ExpressionSet for the rat data are described by the submitters as the "Lowess [sic] normalized log 2 ratio (Cy5/Cy3); signal calculated using Imagene 5.0 and GeneSight 4.1 software (Biodiscovery, Palo Alto, CA)." I do not really understand how this relates to an MAList, which is the usual input for tha lmFit
function. I tried the obvious, but it did not work:
> MA <- as(ratdata, "MAList")
Error in as(ratdata, "MAList") :
no method or default for coercing "ExpressionSet" to "MAList"
I'm also not completely certain for how to set up the fit and contrast matrices; in this case there are not measurements from each channel but rather a single value per probeset representing the log-2 ratio.
I would really appreciate any help in performing this contrast, both from the standpoint of helping my research but also just wrapping my head around R/Bioconductor, and limma and GEOquery in particular.
I'm working on an answer, but my initial thought is that you might need to grab the supplementary data files. These contain separate files with intensities for Cy5 and Cy3, which I think you'll need for limma.
I thought this too, Neil, but I got even more confused because I do not recognize the file formats. From reading limma User's Guide, I thought both the Cy5 and Cy3 intensities should be in the same file, then read in with
read.maimages(targets, source="agilent")
, but I take it these are not the Agilent files.It's very likely that the format is just "some kind of delimited text", rather than anything specific. There are some limma examples floating around the Web describing how to get that type of file into the appropriate Bioconductor object.
I'm actually guessing they're ImaGene files based on the limma User's Guide stating, "ImaGene stores red and green intensities in separate files," (Section 4.3, page 16 of the PDF as of limma v.3.8.2), there being separate Cy3 and Cy5 files in the supplementary material of the GSMs, and the VALUE description in the GSMs stating, "signal calculated using Imagene 5.0 and GeneSight 4.1 software."