Entering edit mode
3.5 years ago
Gene_MMP8
▴
240
I was wondering if there's any technique to pool together RNA-seq data from multiple studies and platforms and normalize them into one single dataset. For microarray data, we already have the COCONUT package that allows for batch correction of microarray datasets using control samples.
Do you have a set of studies in mind? Do they at least nominally have a set of samples that should be the same in each study?
I have shortlisted 4 RNA-seq datasets and they have different numbers of samples.
You can only do batch correction across studies if they share conditions in common (like the control condition for example). This is also true of COCONUT. See this in the reference manual:
This means it is assuming that the control conditions are the same condition in each case. (i.e. if the control is unperturbed HEK293 cells in one study, it is also unperturbed HEK293 cells in the other studies).
There might be other ways to do meta-analysis in cases where this is not true, but it will be very dependent on the individual datasets and the biological question being asked.