You have three major problems:
- Completely different organisms: Mouse and Zebrafish.
- Probably different library preparation and RNA extraction methods, so a technical wetlab bias.
- Probably different bioinformatics pipelines that produced these RPKMs.
That might make it quite difficult to combine results, so the attempt of making a standardized dataset as Kevin suggested below is probably reasonable here.
Still, what I would try is to compare the actual biological readouts, so e.g. are the same cellular pathways up- and downregulated upon that treatment. This is the most relevant (and actually the only relevant) criterium as it is the biological readout whereas PCA and company are "only" statistical methods. For this you would need raw data to perform differential expressionanalysis, or at least a list of differential genes. Another approach would be to use Gene Set Enrichment Analysis, using the top-differential genes from each organism as gene sets and then the results from the other organism as query dataset to perform the GSEA on that gene set.
If you can get the raw data then here is what I'd try:
1) Process with identical bioinformatics pipelines.
2) Perform differential analysis
3) Get enriched pathways per organism and compare either with a statistical test or simply by eye using your biological knowledge
4) define gene sets from both organisms (hopefully the respective genes have homologs between the species) , say the top-500 most up- or downregulated genes and perform GSEA based on these gene sets.
thank you for the details comment! I am comparing the actual readouts but I was wondering if I could show general differences/similarities using a PCA that visually helps a lot to say "oh they are completely different". My analysis as you said, cannot be limited to PCA and reading I realised that without actual raw data it's difficult to do it (but I will try Kevin suggestion). thank you for all the suggestion!
I like that you try to perform multiple types of analysis. In my experience though these high dimensional analysis are often not conclusive even if perfectly executed and even if you have quality data. Eventually all that matters is the biological readout because the biology is what a reviewer looks at when you publish things. That having said, if computational approaches support biological findings that is awesome and increase confidence, but a non-conclusive computational result is in your case here imho not necessarily a problem if you see clear biological effects which maybe you can even back up with additional experiments. Feel free to try out different approaches (this is a great exercise and you learn a lot on the way) but if time is a limiting factor I would always focus more on biological readouts than trying to put together a purely computational analysis.