Dear everybody,
Could you please provide me main principles for validation in microarray-based gene expression meta-analysis studies? First of all, I personally think that meta-analysis itself is a validation study. Therefore, I think it is not necessary to perform validation tests for this kind of studies. However, my colleagues suggest me that we still need validation. I had a look around already but I still cannot make up my mind.
- Some papers used RT-PCR to validate their results. (1)
- Some papers used the similarity between their data sets with another large data set. However, I wonder is it better to perform this way than combined them as once meta-analysis data set? (2)
- Some papers divided their data into training set and testing set or used Leave-one-out cross-validation (LOOCV). (3)
- Others.
Some papers combined (3) and (1). As my understanding, they divide their study into 'statistical validation' and 'experimental validation'. Does it make sense if they conducted studies on human sample and validate by cell line gene expression data?
Thank you.
Thank you very much for your insightful comment. Since all available methodologies have their own issues, could you please give me your advice on choosing a standard method/approach with acceptable risk of errors.
Let suppose that I collected all available data sets that related to my research hypothesis on human sample for a specific disease. I conducted a microarray-based gene expression meta-analysis and got a list of DE genes, enriched pathways, hug genes (from network analysis), etc. Then I selected a couples of genes based on the statistical results and previously reported by mechanism studies. OK, I could stop at this stage and publish the results to a scientific journal.
However, I want to validate my results to get confidence. OK, then it is time for validation. What should I do if I cannot have a cohort contains human samples? Conducting RT-PCR on cell lines? (this approach sounds weird to me), using statistical models (training/testing groups), machine learning approach,...
If you're using other publicly available datasets, then the way to use that as a form of validation, is to see if your interesting observation is seen in that public dataset, independent of your original dataset. If the observation is seen in both your dataset, and a publicly available dataset, then that adds a lot of weight to your argument, as they're independent observations.
Not always, some journals would probably still insist on experimental validation at this stage (qPCR).
If you have no material left from your patients, then you need to state that as your rationale for using a publicly available dataset as an independent validation.
The bottom line to all of this (as I stated in my answer above), is that you need the observation that you're trying to validate to be seen in outside of that dataset. Best case is something like qPCR in a sample not in the original cohort, second to that is using a publicly available dataset as an independent validation.
Thank you very much for you advice. It is now clear to me.