Hi,
What would be the current best practice for normalizing gene expression counts, if I want to compare different characteristics of genes and particluarly gene groups (min, max, mean, sd of expression) between two conditions? I'm interested in questions like: "Is the variance of expression means in condition A is larger than in condition B for a specific gene group?". So genes in group X have more variable mean expression in A than in B, while this is not true for gene group Y.
I guess I have to normalize for library size, gene length, and also correct for the mean-variance dependence of expression.
Maybe vst
+ rpkm
or tpm
transformation? Any other suggestions? Not sure if I can do an rpkm
or tpm
transformation after vst
.
This is an example dataset:
data <- as.data.frame(matrix(rpois(100, lambda = 10), ncol = 5))
colnames(data) <- c("A1", "A2", "A3", "B1", "B2")
genes <- paste0("gene", 1:20)
gene_group <- c(rep("X", 15), rep("Y", 5))
data <- cbind(data, genes, gene_group)