Statistical test for paired RNA data with skewed difference distribution
0
0
Entering edit mode
8 months ago
DJHS • 0

I am currently working on a research project involving RNA-seq data analysis and have encountered a statistical challenge that I hope to receive your insights and suggestions on.

I have two sets of RNA-seq data (normalised for batch effects and log transformed) that I need to compare, and both sets have skewed distributions. The first is skewed to the right (with most values above 7 on a log2 transformed scale), while the values in the second are higher but skewed to the left. These skewed distributions violate the assumption of normality but also symmetry which is typical for traditional paired statistical tests like the paired Wilcoxon Signed Rank.

It's worth mentioning that these two datasets represent different genes, and my goal is not a differential expression analysis but rather a comparative study. I want to assess the difference in expression between two specific genes within the same experimental condition. Therefore packages such as edgeR and DESeq2 don't really fit my need

With over 300 samples in my dataset, I am looking for robust statistical methods or alternative approaches that can handle skewed data distributions and allow for a meaningful comparison between the two datasets.

I would greatly appreciate any insights or recommendations you might have regarding suitable statistical techniques or creative solutions to tackle this challenge.

Statistical-Analysis RNA-seq • 549 views
ADD COMMENT
0
Entering edit mode

Are these datasets completely independent to each other? Please describe the datasets more.

ADD REPLY
0
Entering edit mode

No, for each sample (row) the RNA-seq values of each gene (columns) are extracted from the same sample (row). Therefore the RNA-seq values of the genes are related.

Here is a better breakdwon of the dataset:

  • The data is composed of RNA-seq data from cancer patients
  • The values are normalised and log2 transformed
  • The columns represent the genes
  • The rows represent the samples
  • Distribution of gene of interest A is skewed to the right
  • Distribution of gene of interest B is skewed to the left

My goal is to compare the expression level of the 2 genes across all samples given that they are paired.

ADD REPLY
0
Entering edit mode

So basically gene 1 vs gene 2. I would just do a Wilcox test. Wilcox makes no distributional assumptions, and I personally think that expression levels anyway should not be compared because differences can be technical, for example GC bias, mappability etc.

ADD REPLY

Login before adding your answer.

Traffic: 2484 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6