I have a transcriptome dataset from affymetrix microarray and has multiple cell types which were probed. I would like to obtain the differentially expressed genes from each of the cell types (i.e significant transcripts) using packages in R. My data is in the form of a matrix with genes and samples as rows and columns respectively. I tried using the csSAM package but did not quite know how to do it. Is there any other way or package to do this ?
Nobody knows "quite know how to do it" the first time :) With what, precisely, are you having trouble? Have you read the documentation? Is there an example or tutorial that you are trying to follow? Are you seeing an error message that you don't understand? It's easier to help if your question is better defined.
There are lots of R packages for this task - limma is popular and well-documented - but in all cases you need to make some effort to figure out how they work.
True. I should have given a clearer example.
My dataset looks this way:
The dimension is 22810(genes) x 20(cell types).
I tried to use the csSAM package in R.
where G is my expression matrix where I have the genes as rows and cell types as columns. and cc stands for the Matrix of cell-frequency. (n by k, n samples, k cell-types).
The example from the package had several samples.
I am not clear as how to construct the cell frequency matrix because I have only one sample.
With only one sample per cell type, you are in uncharted territory. What types of arrays were used to generate these data?
Affymetrix arrays ( Arabidopsis ATH1 chip)
Are you sure the data you have shown is one sample per celltype i,e, only one column for each cell type or is it some sought of average you are using??
Yes, it is after the average of three replicates
You'll definitely want to get the data in the un-averaged form. With the averaged replicates, you will be unable to apply statistical methods to determine differential expression (since you have no replicates). After you have the data with three replicates per cell type, I will refer to my answer below for multi-group comparison using limma.
Thank you. Yes, I am trying out different contrasts that I could apply to get what I want. I think Limma would be the best option to do this.
So if I try this particular contrast as follows: I have cell types A, B ,C,D I'd use limma and define a contrast like A - (B+C+D)/3", contrast.matrix<-makeContrasts("p1-(p2+p3+p4)/3",levels=design) I tried this and got some DEG's. So will that be those ones only in A and not in others ?