Question

Statistical test for paired RNA data with skewed difference distribution

0

Entering edit mode

11 months ago

DJHS • 0

I am currently working on a research project involving RNA-seq data analysis and have encountered a statistical challenge that I hope to receive your insights and suggestions on.

I have two sets of RNA-seq data (normalised for batch effects and log transformed) that I need to compare, and both sets have skewed distributions. The first is skewed to the right (with most values above 7 on a log2 transformed scale), while the values in the second are higher but skewed to the left. These skewed distributions violate the assumption of normality but also symmetry which is typical for traditional paired statistical tests like the paired Wilcoxon Signed Rank.

It's worth mentioning that these two datasets represent different genes, and my goal is not a differential expression analysis but rather a comparative study. I want to assess the difference in expression between two specific genes within the same experimental condition. Therefore packages such as edgeR and DESeq2 don't really fit my need

With over 300 samples in my dataset, I am looking for robust statistical methods or alternative approaches that can handle skewed data distributions and allow for a meaningful comparison between the two datasets.

I would greatly appreciate any insights or recommendations you might have regarding suitable statistical techniques or creative solutions to tackle this challenge.

Statistical-Analysis RNA-seq • 607 views

ADD COMMENT • link updated 11 months ago by Ram 44k • written 11 months ago by DJHS • 0

0

Entering edit mode

Are these datasets completely independent to each other? Please describe the datasets more.

ADD REPLY • link 11 months ago by ATpoint 86k

0

Entering edit mode

No, for each sample (row) the RNA-seq values of each gene (columns) are extracted from the same sample (row). Therefore the RNA-seq values of the genes are related.

Here is a better breakdwon of the dataset:

The data is composed of RNA-seq data from cancer patients
The values are normalised and log2 transformed
The columns represent the genes
The rows represent the samples
Distribution of gene of interest A is skewed to the right
Distribution of gene of interest B is skewed to the left

My goal is to compare the expression level of the 2 genes across all samples given that they are paired.

ADD REPLY • link 11 months ago by DJHS • 0

0

Entering edit mode

So basically gene 1 vs gene 2. I would just do a Wilcox test. Wilcox makes no distributional assumptions, and I personally think that expression levels anyway should not be compared because differences can be technical, for example GC bias, mappability etc.

ADD REPLY • link 11 months ago by ATpoint 86k