Hi guys
I have a question, there is two different studies each used the same Array and platform A-AFFY-44 -Affymetrix GeneChip Human Genome U133 Plus 2.0 [HG-U133_Plus_2], can i combine data from each of these studies??
Like in one study they have a good control and in the second study they have specific type of leukemia and do differential expression between them??
/Thanks!!
Check if all the probes that are measured are consistent across both the dataset
Combine the datasets, normalize the matrix and then check for possible batch effects using PCA and MDS (i.e. if the samples are getting clustered based on experiments).
If yes, then apply batch correcting algorithms such as sva or ComBat and get the batch corrected matrix
Apply PCA and MDS on the batch corrected matrix and check if the samples clustered based on batches or not. If yes, then use the dataset at your own peril.
If yes, then apply batch correcting algorithms such as sva or ComBat and get the batch corrected matrix
Batch is confounded by study, it is therefore mathematically not possible to correct for anything.
To answer the question, no you should not combine these datasets as you cannot tell whether differences you see are due to biological effects or batch as these cannot be separated. batch1=tumor; batch2=normal, hence batch is the same as "condition". Technically you can combine them and run any analysis, but be aware that results are not reliable, at best it is for exploratory analysis.
What if both studies used the same probes and there was no batch effects??
Theoritically that would indicate any differences are only biological right?? Or not?
The probes are not the problem, rather the exact wetlab procedure, how they extracted RNA, did cDNA etc. Yes, if there was no batch effect you could do it, but as batch=condition there is no way to diagnose it. From what I've seen in RNA-seq there is almost a guarantee that you have a massive batch effect.
What if I checked for RNA degradation with the affy package? Shouldn't that give an indication for the RNA? And doing normalization across samples can compensate for any differences?
So there is no way to check for batch difference if batch=condition?
Batch is confounded by study, it is therefore mathematically not possible to correct for anything.
To answer the question, no you should not combine these datasets as you cannot tell whether differences you see are due to biological effects or batch as these cannot be separated. batch1=tumor; batch2=normal, hence batch is the same as "condition". Technically you can combine them and run any analysis, but be aware that results are not reliable, at best it is for exploratory analysis.
What if both studies used the same probes and there was no batch effects?? Theoritically that would indicate any differences are only biological right?? Or not?
The probes are not the problem, rather the exact wetlab procedure, how they extracted RNA, did cDNA etc. Yes, if there was no batch effect you could do it, but as batch=condition there is no way to diagnose it. From what I've seen in RNA-seq there is almost a guarantee that you have a massive batch effect.
What if I checked for RNA degradation with the affy package? Shouldn't that give an indication for the RNA? And doing normalization across samples can compensate for any differences? So there is no way to check for batch difference if batch=condition?