Question

RNA seq, correlation, gene expression, STAR count

0

Entering edit mode

12 months ago

Rob ▴ 170

I want to do correlation analysis for STAR count RNA-Seq gene expression data with a continuous variable. What method is preferred? spearman or person?

Thanks

STAR correlation RNA-seq • 1.4k views

ADD COMMENT • link 12 months ago by Rob ▴ 170

score 3 · Accepted Answer · 2024-01-27

Either the Pearson correlation coefficient or the Spearman rank correlation coefficient are frequently used in correlation analyses between RNA-Seq gene expression data (measured as counts) and a continuous variable. The features of your data will determine whether you use Pearson or Spearman correlation.

Spearman Correlation: When to utilize it: When there could be outliers or when your data is not regularly distributed, apply Spearman correlation. A non-parametric metric called Spearman's rank correlation evaluates the monotonic relationship between variables. Compared to Pearson correlation, it is less susceptible to outliers and does not presume a linear relationship.

Pearson Correlation : When to utilize it: If your data is roughly regularly distributed and devoid of major outliers, apply Pearson correlation. By evaluating the linear relationship between variables and presuming normal distribution of the data, Pearson's correlation is calculated.

score 3 · Accepted Answer · 2024-01-27

3

Entering edit mode

12 months ago

dsull ★ 7.2k

Use both and report both.

ADD COMMENT • link 12 months ago by dsull ★ 7.2k

0

Entering edit mode

thanks for your response. But they give me different numbers of genes as significantly correlated. I want to use the genes as classifiers to develop a model. So, I have to choose one method.

ADD REPLY • link 12 months ago by Rob ▴ 170

2

Entering edit mode

Just choose one and go with it -- each tells you something different about your data and there is no right answer. If this is such a big decision, then try both and see which one performs better on the held-out validation set.

In statistics, there are a lot of decisions where the answer is either "it depends" or "there is no right answer".

I have no idea what your classifier is or what your continuous variable is -- and even if I did, my answer would likely remain the same.