Entering edit mode
5.0 years ago
Mathew Bunj
▴
40
I have 4 datasets of gene expression from different organisms that I want to compare. For PCA comparison will it be reasonable to take genes present in all datasets, so as to avoid biases (easy interprettaion) if the gene is present in one dataset or two datasets
What is the goal of your PCA analysis? Can you describe the organisms and the datasets? If gene X was only expressed in 1 of the 4 organisms, and was therefore an important differentiator among your organisms, is that difference something that you want to account for in your PCA analysis, or not? If yes, then you would not want to remove that gene from your analysis. Do you have confident orthology or gene identity among your 4 organisms such that that you can confidently say gene X in organism #1 is identical to gene X in organism #2? How are the quantitative expression levels normalized within and across the 4 organism-datasets? If there are systematic normalization or processing differences, that will likely drive any PCA analysis.