Hi all,
I need to simulate a microarray data set with paired samples. In particular I'd like to generate a matrix with 1000 genes(rows) and 20 samples ( columns) from 10 patients. I'd like to have the first 100 genes as differentially expressed consistently in all the patients. For "consistently" I mean that these genes must be up regulated or down regulated with the same fold change in all the patients. In other words, if the matrix has 20 samples from 10 patients, assuming that the first 10 columns represent the "normal samples" and the second 10 columns represent the "tumor samples" from the same patients, I want the first 50 genes must be up regulated consistently in all the patients of 3 fold in the tumors; and the second 50 genes consistently down regulated of 3 fold in all the patients.
Can anyone help me?
What kind of data are you hoping to end up with?
When you talk about microarray data, is that going to be fluorescence numbers or A/B ratios? There are quite a few different kinds of microarray.
If you want it to be expression data, maybe those are random numbers in the range 0-20. R has a lot of easy random number generators, and you can craft the matrix with those. Look into the family of Gamma distributions and choose some parameters that suit your expectations.
Hi again this is the code I generated
Do you think is correct?
Well it does appear to do what you asked for. In that sense it works. However, the perfect correlation and perfect 3-fold-change you have constructed is absolutely unrealistic. The simulation looks nothing like real data, so your subsequent experiments will be meaningless.
Hi,
Thank you Karl,
I added a bit of noise
Hi Karl and Michael,
Thank you very much. Karl, yes, I was referring to expression data that can be generated with one color microarray technology.
Michael, I don't want to see difference between means, I want differences that must be consistent in all patients. For example a differentially expressed gene should be a gene that increased ( or decreased) its expression of ~3 Fold in all the patients. If you consider the mean you can also get genes whose mean is changing between normal and tumor just because is changing in a subset of patients.
Thank you again
Pas