Hi all. I have been trying to carry out a task see here, but I think there is a much easier way than the overly-complicated way I have described in my previous post. I just have completely confused myself and would appreciate some help!
I have 10 gene expression data sets (i.e. gene expression data:.CEL files and phenotype data: age of sample; either as a category "old" or "young", or the numerical age of the sample) that I downloaded from GEO. The 10 data sets are different organisms (i.e. mouse, rat and human) and are all Affymetrix single-channel (although different platforms, e.g. GPL85, GPL96). The aim is to do a differential expression analysis of the genes with age per data set (and then I will do a vote-counting method to look at generally which genes are differentially expressed overall between the data sets).
As you can see from my previous post, I think I have done a round-about way to do this and I have run into problems. I'm wondering if anyone has sample code to demonstrate how I could do this easier; one alternative method is:
I have a phenotype file with a set of samples and phenotypic data, e.g.
id,Age
GSM1330619,14
GSM1330620,14
GSM1330621,14
GSM1330622,14
GSM1330623,2
GSM1330624,2
GSM1330625,2
GSM1330626,2
This is from data set GDS5412. *NB COMMENT: this is not ALL of the samples from GDS5412. I only want a particular subset of the samples. So I want to provide any code with my own phenotype file.*
So what I want to do is:
1. Connect to GEO and obtain a set of GDSXXX files (I have the list that I want).
2. For each GDS files, I have a set of phenotype files that describes the samples and phenotypes that I want.
3. Input this data set into limma.
4. Run RMA on the raw CEL files.
5. Combine the expression values per probe into per-gene.
6. Do a differential expression analysis for the samples gene expression with the phenotype file that I have provided (e.g. in this case, I am asking is there a differential gene expression between the samples that are 2 months old and 14 months old).
If anyone could provide some basic code for this I would really appreciate it, as you can see from my previous post I'm so confused! Also, I started a new post as I think this is a new topic (combining GDS2eset and limma) and my old topic was about doing everything manually so the title of the old topic is not relevant to this, but if anyone feels this could be a comment in my old post please let me know, I've no problem moving it.
I think the bones of the code that I need is:
and then something like this for the differential expression analysis:
This code obviously doesn't work, I can't work out how to do some of it and I get errors, but I'm wondering if someone could improve on this or suggest an alternative? I am happy to pull down everything from GEO, with the exception that I want to extract a subset of samples and their phenotypes to work on from the data set.