Dealing with variable gene expression and outliers
0
0
Entering edit mode
3.1 years ago
KitScorpion ▴ 10

I want to identify genes, whose mean expression may be less representative due to high variability or outliers.

Specifically, I am working with single-cell data that is pre-processed (normalized and clustered to cell types). For my downstream analysis, I need to use the mean expression per gene and cluster.

However, the question arose if outliers could skew my mean and impact my downstream analysis. As I need to use the mean, the idea is to exclude specific genes in specific clusters, if outliers might be skewing the mean.

My brain is in a bit of a knot how to go about this. I've tried / considered different things, but can't think of a good systematic approach to go about this. I would greatly appreciate feedback and tips.

  • Idea 1: For each cluster and each gene, check if a Boxplot shows outliers (Quartile + interquartile range*1.5). This may be too conservative though, as one outlier in 200 single cells would barely impact the mean.
  • Idea 2: Check the mean and the standard deviation of each gene per cluster. If a gene has a standard deviation above a certain threshold (e.g. mean * 1.6), exclude that gene. Unsure if this is a good approach and which threshold would be recommendable.
  • Idea 3: Check the variability of all genes per cluster, and identify outliers at that level (e.g. with the boxplot-approach - if the variance of a gene is greater than quartile + iqr*1.5 of the variances of all genes in my data, exclude it). unsure if this is a good approach and if it may be biased due to biologically differing variability of gene expression.

I feel like there must be an established approach for this, but I don't know it. My personal tendency at the moment is idea 2 ...

I'd be very grateful for any tips or feedback.

single-cell rna-seq variability statistics • 461 views
ADD COMMENT

Login before adding your answer.

Traffic: 1721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6