Since these are on different scales, I can't do clustering. Also I have tried batch normalization using combat but this does not work. Any suggestions?
ADD COMMENT
• link
updated 13 months ago by
aUser
▴
70
•
written 9.0 years ago by
Ron
★
1.2k
0
Entering edit mode
I would not try to normalize them together or make them comparable, they never will be. Instead maybe a non-negative matrix factorization approach or some similar algorithm could be helpful?
I have a similar situation to yours, where I want to merge RNA-Seq and Microarray datasets (I have some samples from RNA and others from Microarray), I would like to ask if you found a way to do that? and what were the results?
I don't think clustering or other subgroup discovery methods would really be appropriate to perform on a combined data set. You could try applying the LIMMA/Voom normalization to the RNAseq data -- this corrects for total library size and attempts to capture mean-variance relationships and applies a log normalizaiton. Then, for each data set separately, you could z-scale each gene's expression values. This might put things on an identical scale and focus on the mean-variance relationships within the RNAseq data. Perhaps limit your examination to genes that are above some detection threshold (>10 raw reads in 50% of samples, or something similar) in the RNAseq data. You could try clustering or subgroup discover, however, if your clustering solution consistently aligns with the two platforms, then you know some bias still exists in the data. Instead, maybe perform separate clustering analysis for each data set.
If you're interested in finding differentially expressed genes, then one acceptable approach might be to model each data set separately using appropriate methods for each data type, then combining the resulting test statistics using a meta-analytic method. On the other hand, meta-analysis (e.g. per each gene) might require both the microarray and RNAseq test statistics (e.g. p-value and effect size) to be produced by the same statistical test. In that case, you might consider using normalizing the RNAseq data following the LIMMA voom approach -- supposedly this renders the data suitable for parametric analyses (i.e. it might be appropriate to use the same statistical model as used for the microarray, facilitating meta-analysis.).
I think you could z score normalise each row and then just cluster them all together and possibly use a technique like consensus clustering to improve robustness.
I found this article https://peerj.com/articles/1621/ quite interesting. They get good results with quantile normalization [targeted, that means that they adapt a target dataset (RNA-seq) to a reference dataset (microarray)] and TMD methods. For checking the code they use just go to the Supplementary info.
Just in case someone ese stumbled upon this issue, there is a package to integrate RNA-seq and microarray data; GEDI (https://www.biorxiv.org/content/10.1101/2021.11.11.468093v1) that can be used to process/integrate both data sets. GEDI package used SVA (as mentioned by @theobroma22).
I would not try to normalize them together or make them comparable, they never will be. Instead maybe a non-negative matrix factorization approach or some similar algorithm could be helpful?
Dear Ron,
I have a similar situation to yours, where I want to merge RNA-Seq and Microarray datasets (I have some samples from RNA and others from Microarray), I would like to ask if you found a way to do that? and what were the results?
Many Thanks, Bests, Ilyes