Hi,
I know it is a very general question. I would like to compare expression (RNA-seq) data from human and mouse. Basically I have a big cohort of RNA-seq data from human tumors (different subgroups). Also, I have RNA-seq data from a mouse tumor model and I would like to compare if the tumors generated in mice clustered together with any of the human subgroups.
My questions is about any idea of normalization of the data between the human and mouse RNA-seq.
I have mapped reads to the respective genomes and quantified gene expression, I merged the expression data of orthologous genes between human and mouse. Clustering the data as it is, it's hard to get any close cluster between human and mouse. So I think that some normalization should be applied.
I have tried to look at the bibliography but the approaches are very disparate (from no normalization at all to several steps of normalization).
I am looking for any suggestion of a standard way to do this.
Thank you in advance.
If I were asked to do it, my approach would be:
You don't have my permission to state that this approach is in any way valid, though.
An addition: after step 3, modelling differences between both datasets and then adjusting for these may be more appropriate than Z-scaling. However, this may inadvertently wipe out whatever effect you may be modelling between mouse and human.