Is there a commonly accepted prior distribution on gene expression from microarray experiments?
I'm interested in any priors used in microarray analysis that are biologically meaningful. For example, is a Gaussian prior most appropriate for log2 transformed normalised oligo data? If so, is there a good reason for this?
I'm asking as Wang et al seem to generate a prior using data from one ('Lymphochip') microarray and then update this "prior" using data from another (Affy) microarray. I'm not convinced this is particularly "Bayesian", and would be more comfortable given a prior derived from some understanding of how the data should be distributed, which is then updated using both the Affy and Lymphochip data.
I'd be curious to know how others felt about this approach, too!
This a pretty complex issue. Now the paper seems to be six years old, I would search the citing literature and see what type of validation or critique the method has gained over the years.
Starting from these earlier papers the models gets progressively more complicated, as people build hierarchical models to represent gene expression. I was hoping that this paper might serve as an example of a prior distribution, and allow an answer that focuses on priors on gene expression (or fold change or whatever) rather than getting caught up in wider modelling issues. I thin you're right, though - I need to follow the literature along and see how people combine multiple data sources...
are you trying to combine different microarray datasets? what are you trying to achieve by doing this?
Right now: yes I'm trying to combine different microarray data sets, though I tried to keep the question pretty general because I'd like to start getting some basic understanding of gene expression from a data centric point of view. Over the last 6 months I've kind of jumped into microarray analysis head first without really covering the basics.
have you considered something like RankProd (see here ... all you need is lists of differentially expressed genes in order to do this and you don't combine the underlying expression values.
I don't know if you've seen that before or not. Hope it helps.
Thanks for the RankProd pointer. One of the reasons I was starting to look at more complex models of expression was to assess the potential of combining RT-PCR data with array data. I'm pretty sure the numbers emerging from these analyses will be in completely different spaces, and hence a model of expression would become pretty important. And coming from a discipline that suggests "model, don't normalize", one of the first questions to think about is my prior distribution. I'm starting to think, though, that this is not a common approach...