Question

Filter count matrix -with replicates -based on variance

0

Entering edit mode

3.7 years ago

annaA ▴ 10

Hello,

Hello I have a count matrix ( gene * sample), my samples can be divided to days ( day 0 to day 4) and for each day I have 7 replicates. I want to filter based on the variance (exclude the genes with low variance). I know how to calculate row-wise variance BUT now I want to calculate the variance between the days(subset of samples) and not across each individual sample

Here is my data

The code I have used for calculating variance is the following How I can edit it in order to take the variance between days and not between individual samples?

 data$variance = apply(data, 1, var)
data2 = data[data$variance >= quantile(data$variance, c(.50)), ] #50% most variable genes
data2$variance <- NULL

Thanks in advance, Anna

filter RNA-Seq count-matrix • 1.6k views

ADD COMMENT • link 3.7 years ago by annaA ▴ 10

0

Entering edit mode

Are these raw data or normalized? You probably want to normalize and log2-scale these data as otherwise 1) the sequencing depth differences confound your analysis (I guess) and most importantly larger counts (on that scale) have more variance than smaller ones, therefore log-transformation is common to counter that effect. For simplicity the DESeq2::vst or DESeq2::rlog for normalization and transformation might be of interest here.

ADD REPLY • link 3.7 years ago by ATpoint 85k

0

Entering edit mode

Hello,

Yeah I know what to do you mean but currently I want to use a specific R package (wTO) and the developer suggested me to use unnormalized data us well ,otherwise I am using DESeq2 for normalization Any hind about the variance ?? :) Anna

ADD REPLY • link 3.7 years ago by annaA ▴ 10

0

Entering edit mode

How about modeling mean/variance relationship and then simply block for the group information?

https://rdrr.io/github/MarioniLab/scran/man/modelGeneVar.html

This function was written for single-cell data but essentially it takes a count matrix as input and a vector of factors indicating the groups. I guess that may work here. Cannot comment on that package you mention. The function I link usually takes as input normalized counts on log scale and then models variance as function of the mean expression.

ADD REPLY • link 3.7 years ago by ATpoint 85k