Hi,
Can I continue the analysis after obtening the toptable of differentially expressed genes using the R code of GEO2R or I should do my own analysis? I have many samples in my dataset, how can I define the groups in GEO2R because it influence the result of the toptable.
The normalization in GEO2R is the Log transformation ?? Thank you in advance
GEO2R is just an interface between R and the Gene Expression Omnibus (GEO) database. It usually allows you to directly import data that is already normalised. The normalisation type, if Affymetrix microarray data, will most likely consist of a 3-step process known as RMA (Robust Multiarray Average) normalisation:
background correction
quantile normalisation
log base 2 transformation
When you import this data into R, you can do anything that you want with it. Usually people fit a linear model with limma and then extract the statistical representations of this model via the toptable() function.
If you wish to merge datasets ('groups'?) together, then they should ideally be from the same microarray version.
Hi Kevin and Happy new year 2018 ,
Thank you for your answer ,
ok, for the normalisation I already did it with the function rma in my R code, this give the same result? and also I tested the gcrma normalisation with the function gcrma, what is the best method for normalisation?
for defining groups: I means that in GEO2R we can group samples of a GSE dataset to do differential expression analysis, how can I do this? I should group all the samples in groups or some samples?
Yes, RMA and gcRMA will both perform the background correction, quantile normalisation, and log transformation automatically for you. gcRMA is slightly different in that it takes into account biases related to the DNA bases G and C, which can, for example, affect the annealing temperature of a probe to the template DNA. It is better to use gcRMA, as it will result in less criticism of your work.
For the other question, you should only combine samples if they are from the same microarray, for example, Affymetrix HuGene, etc.
How many GEO datasets are you analysing?; are they all based on the same microarray?
In order to conduct the differential expression analysis, limma is used. It involves the construction of a 'design matrix' based on your groups of interest and then fitting a linear model to your data based on this model. Then, statistical values can be obtained from the linear fit.
There are many, many limma tutorials online. I can help here but I just need to understand better your experimental set-up.
Thanks, coincidentally, that group is based very near where I did my PhD (in the 'heart' of England).
Why have you chosen this particular experiment? - i.e., in which facet of Arabidopsis spp. are you interested? According to the authors, the study relates to the AtGenExpress project and their samples consist of data from "different tissues and different developmental stages in wild type Columbia (Col-0) and various mutants".
I was able to find metadata in the Series Matrix File. So, if I wanted to compare the various mutants to wild-types, I would first have to produce my own metadata file:
Limma does not like hyphens, so, we'll have to erase these:
metadata$Type <- gsub("-", "", metadata$Type)
I also should have the expression data. In your case, this would have been produced by rma / gcrma. In you case, you may have to access the expression counts by using the expr() function on the object produced by your rma / gcrma function.
Hi Kevin and Happy new year 2018 , Thank you for your answer ,
ok, for the normalisation I already did it with the function rma in my R code, this give the same result? and also I tested the gcrma normalisation with the function gcrma, what is the best method for normalisation? for defining groups: I means that in GEO2R we can group samples of a GSE dataset to do differential expression analysis, how can I do this? I should group all the samples in groups or some samples?
Hi! Happy New Year!
Yes, RMA and gcRMA will both perform the background correction, quantile normalisation, and log transformation automatically for you. gcRMA is slightly different in that it takes into account biases related to the DNA bases G and C, which can, for example, affect the annealing temperature of a probe to the template DNA. It is better to use gcRMA, as it will result in less criticism of your work.
For the other question, you should only combine samples if they are from the same microarray, for example, Affymetrix HuGene, etc.
How many GEO datasets are you analysing?; are they all based on the same microarray?
In order to conduct the differential expression analysis, limma is used. It involves the construction of a 'design matrix' based on your groups of interest and then fitting a linear model to your data based on this model. Then, statistical values can be obtained from the linear fit.
There are many, many limma tutorials online. I can help here but I just need to understand better your experimental set-up.
thank you, I am analysing this dataset from the Affymetrix ATH1 microarray: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE5632
Thanks, coincidentally, that group is based very near where I did my PhD (in the 'heart' of England).
Why have you chosen this particular experiment? - i.e., in which facet of Arabidopsis spp. are you interested? According to the authors, the study relates to the AtGenExpress project and their samples consist of data from "different tissues and different developmental stages in wild type Columbia (Col-0) and various mutants".
I was able to find metadata in the Series Matrix File. So, if I wanted to compare the various mutants to wild-types, I would first have to produce my own metadata file:
Limma does not like hyphens, so, we'll have to erase these:
I also should have the expression data. In your case, this would have been produced by rma / gcrma. In you case, you may have to access the expression counts by using the
expr()
function on the object produced by your rma / gcrma function.It's then key to match the order of samples in your expression matrix with the order of vaues in the metadata. In my example, they match:
If they do not match, you will have to re-order one or the other using
match()
or some other function. The remainder is then easy: