I want to do correlation analysis for STAR count RNA-Seq gene expression data with a continuous variable. What method is preferred? spearman or person?
Thanks
I want to do correlation analysis for STAR count RNA-Seq gene expression data with a continuous variable. What method is preferred? spearman or person?
Thanks
Either the Pearson correlation coefficient or the Spearman rank correlation coefficient are frequently used in correlation analyses between RNA-Seq gene expression data (measured as counts) and a continuous variable. The features of your data will determine whether you use Pearson or Spearman correlation.
Spearman Correlation: When to utilize it: When there could be outliers or when your data is not regularly distributed, apply Spearman correlation. A non-parametric metric called Spearman's rank correlation evaluates the monotonic relationship between variables. Compared to Pearson correlation, it is less susceptible to outliers and does not presume a linear relationship.
Pearson Correlation : When to utilize it: If your data is roughly regularly distributed and devoid of major outliers, apply Pearson correlation. By evaluating the linear relationship between variables and presuming normal distribution of the data, Pearson's correlation is calculated.
Just choose one and go with it -- each tells you something different about your data and there is no right answer. If this is such a big decision, then try both and see which one performs better on the held-out validation set.
In statistics, there are a lot of decisions where the answer is either "it depends" or "there is no right answer".
I have no idea what your classifier is or what your continuous variable is -- and even if I did, my answer would likely remain the same.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for your answer. My confusion is: The STAR count gene expression data is harmonized, but not normalized. For the correlation analysis, I normalize data. However, I do not know if peasrson should be used here. IS this normalization I did the same as what was needed for pearson? as the original nature of data was not normalized oroginally.