Entering edit mode
3.7 years ago
annaA
▴
10
Hello,
Hello I have a count matrix ( gene * sample), my samples can be divided to days ( day 0 to day 4) and for each day I have 7 replicates. I want to filter based on the variance (exclude the genes with low variance). I know how to calculate row-wise variance BUT now I want to calculate the variance between the days(subset of samples) and not across each individual sample
The code I have used for calculating variance is the following How I can edit it in order to take the variance between days and not between individual samples?
data$variance = apply(data, 1, var)
data2 = data[data$variance >= quantile(data$variance, c(.50)), ] #50% most variable genes
data2$variance <- NULL
Thanks in advance, Anna
Are these raw data or normalized? You probably want to normalize and log2-scale these data as otherwise 1) the sequencing depth differences confound your analysis (I guess) and most importantly larger counts (on that scale) have more variance than smaller ones, therefore log-transformation is common to counter that effect. For simplicity the
DESeq2::vst
orDESeq2::rlog
for normalization and transformation might be of interest here.Hello,
Yeah I know what to do you mean but currently I want to use a specific R package (wTO) and the developer suggested me to use unnormalized data us well ,otherwise I am using DESeq2 for normalization Any hind about the variance ?? :) Anna
How about modeling mean/variance relationship and then simply block for the group information?
https://rdrr.io/github/MarioniLab/scran/man/modelGeneVar.html
This function was written for single-cell data but essentially it takes a count matrix as input and a vector of factors indicating the groups. I guess that may work here. Cannot comment on that package you mention. The function I link usually takes as input normalized counts on log scale and then models variance as function of the mean expression.