Following this definition, all deterministic methods are 100% reliable, because they always reproduce the same result when repeated. Reliability is - of course - important for measurements, but data-transformations are not measurements. There are some statistics (not normalization methods I know of) for example those involving the
EM-algorithms or k-means clustering.
So, my advise: check if the methods are deterministic, then they are reliable by definition. This question of reliability is for sure relevant for the measurement techniques such as microarrays, qPCR, RNA-seq, but it is totally solved for normalization (say: ALL methods are deterministic/reliable). If you are looking for a problem to solve in normalization this is definitely not the right place.
BTW.: one can easily assess the reliability. If you want to check RMA, loess-normalization, mean or quantile normalization, just run it on the same input data say 1000 times and look at the results.
BTW2.: RMA because mentioned (robust multichip average) is not (only) normalization, it comprizes background subtraction, quantile normalization (a totally deterministic method), and intensity sumarization.
Edit: Just to restrict the above said again. There are some reliability issues with normalization. I just saw a message on bioconductor noticing differences in the analysis using GCRMA on windows/linux. As said, most normalization and summary methods are deterministic as long as data and methods stay the same. However, there can be variations on the probe level, even when using the same array design. The most common source of such events is that the array annotation and thereby the probe-level groups and their assignments to genes are changed.
This is sort of a "pseudo-(un)reliabilty" because if all parameters are the same, the results are the same. But the annotations are frequently changed and the annotation updates are mostly included automagically without the user noticing the difference. This is specifically true for the Affy platform.
Can you give a short definition of what you mean by reliability in the statistical sense? I confess I had to look up the definition myself, but if it is as is wikipedia "In statistics, reliability is the consistency of a set of measurements or measuring instrument, often used to describe a test. Reliability is inversely related to random error." then the question makes no sense, because most or all normalization methods are deterministic.
It's true that many models are deterministic. But, the most used are model-based. Hence, stochastic/statistical in its nature (e. g. RMA). If one treats normalization as an experiment (which indeed it is), this question makes a lot of sense, though.
It is totally wrong that a method just because it involves 'a model' becomes non-deterministic! A linear model, given the same data, for example reproduces the same results, always. So, reliability is of "utmost importance", but it is solved.
RMA depends on a linear statistical model. You can check it on the paper if you want. I agree with you if you use a given linear model and OLS it will give the same results. Still, it is a statistical model. But, normalization is not so simple !!! People use a wealth of techniques. If it was just linear regression, this question would be trivial. But, as it depends on many instances on M-estimators, specific training sets, etc., I still think that its not solved. Otherwise, people wouldn't gather on a room for two days to discuss which one is the most reliable.
I think asking for the "best method" ends up being less productive than asking for opinions on the strengths and weaknesses of a few existing methods.