Question

Methods for validation in microarray-based gene expression meta-analysis studies

0

Entering edit mode

8.2 years ago

BioMed ▴ 50

Dear everybody,

Could you please provide me main principles for validation in microarray-based gene expression meta-analysis studies? First of all, I personally think that meta-analysis itself is a validation study. Therefore, I think it is not necessary to perform validation tests for this kind of studies. However, my colleagues suggest me that we still need validation. I had a look around already but I still cannot make up my mind.

Some papers used RT-PCR to validate their results. (1)
Some papers used the similarity between their data sets with another large data set. However, I wonder is it better to perform this way than combined them as once meta-analysis data set? (2)
Some papers divided their data into training set and testing set or used Leave-one-out cross-validation (LOOCV). (3)
Others.

Some papers combined (3) and (1). As my understanding, they divide their study into 'statistical validation' and 'experimental validation'. Does it make sense if they conducted studies on human sample and validate by cell line gene expression data?

Thank you.

gene expression meta-analysis validation • 2.7k views

ADD COMMENT • link updated 8.2 years ago by andrew.j.skelton73 6.6k • written 8.2 years ago by BioMed ▴ 50

score 2 · Answer 1 · 2016-09-12

2

Entering edit mode

8.2 years ago

andrew.j.skelton73 6.6k

I personally think that meta-analysis itself is a validation study

Strong statement. It depends how you're performing the meta-analysis. I'm all for using other publicly available data, but a lot of the time it can be used (in some cases dangerously), as a "power boost" of sorts, which results in people still clutching at straws in their data. To validate an observation, then simply, you need to repeat the experiment with an independent cohort (often not feasible), or use qPCR.

Some papers used RT-PCR to validate their results.

Probably used most frequently, as it's an observation independent from the dataset you drew the hypothesis from... and it's relatively cheap.

Some papers used the similarity between their data sets with another large data set. However, I wonder is it better to perform this way than combined them as once meta-analysis data set? (2)

A combined "meta-analysis data set" again, is a dangerous thing to do if you don't understand the caveats of what you're trying to achieve from a statistical perspective. Merging datasets is not a trivial task. This also isn't validation, as much as an experimental power boost in most cases. It would be best to see if you find observation X from your dataset in a publicly available dataset (but there isn't always a publicly available dataset similar to yours).

Some papers divided their data into training set and testing set or used Leave-one-out cross-validation (LOOCV). (3)

Machine learning approaches such as these carry their own issues. Splitting the dataset randomly into a training and test set should be permutated to avoid false positive results (often overlooked). The issue with this approach is that you're still using the same dataset to try and validate an observation.

The bottom line is that if you find an interesting observation in your dataset, and you want to validate it, then you need to see the same observation outside of that dataset, independent of the original cohort.

ADD COMMENT • link 8.2 years ago by andrew.j.skelton73 6.6k

0

Entering edit mode

Thank you very much for your insightful comment. Since all available methodologies have their own issues, could you please give me your advice on choosing a standard method/approach with acceptable risk of errors.

Let suppose that I collected all available data sets that related to my research hypothesis on human sample for a specific disease. I conducted a microarray-based gene expression meta-analysis and got a list of DE genes, enriched pathways, hug genes (from network analysis), etc. Then I selected a couples of genes based on the statistical results and previously reported by mechanism studies. OK, I could stop at this stage and publish the results to a scientific journal.

However, I want to validate my results to get confidence. OK, then it is time for validation. What should I do if I cannot have a cohort contains human samples? Conducting RT-PCR on cell lines? (this approach sounds weird to me), using statistical models (training/testing groups), machine learning approach,...

ADD REPLY • link 8.2 years ago by BioMed ▴ 50

2

Entering edit mode

If you're using other publicly available datasets, then the way to use that as a form of validation, is to see if your interesting observation is seen in that public dataset, independent of your original dataset. If the observation is seen in both your dataset, and a publicly available dataset, then that adds a lot of weight to your argument, as they're independent observations.

OK, I could stop at this stage and publish the results to a scientific journal.

Not always, some journals would probably still insist on experimental validation at this stage (qPCR).

What should I do if I cannot have a cohort contains human samples?

If you have no material left from your patients, then you need to state that as your rationale for using a publicly available dataset as an independent validation.

The bottom line to all of this (as I stated in my answer above), is that you need the observation that you're trying to validate to be seen in outside of that dataset. Best case is something like qPCR in a sample not in the original cohort, second to that is using a publicly available dataset as an independent validation.