Hi,
I have two corresponded count table, they are RNA-seq count data.
Example, actual matrix much larger:
Matrix A:
3 3 6 2 0 0
15 10 0 8 0 0
0 0 0 0 0 0
0 0 0 0 0 0
3 14 2 2 2 7
Matrix B:
29.0 31.5 27 29.5 0.0 0
33.0 37.5 0 34.0 0.0 0
0.0 0.0 0 0.0 0.0 0
0.0 0.0 0 0.0 0.0 0
24.5 22.5 26 23.5 23.5 24
The second matrix is a average of two integer count table, so it has *.5 numbers.
The two matrix are corresponding with each other, that means what in the first matrix is 0, it is 0 at the second.
The problem is, I want normalize the first matrix with the second one, i.e, by element-wise division of first matrix with the second.
- the data are extremely skewed with a lot of '0's.
- the number in second matrix is general much larger than the first. After division, I end up with a matrix containing very tinny numbers and '0's.
I want to find differentially expression genes across conditions. (each 3 columns are replicates). I'm now stacked with how to properly normalize and what kind of model to fit into a statistic test.
Any help would be appreciated.
Best
You can perform gene-wise normalization in limma, edgeR and DESeq2 (this is how conditional quantile normalization works), but the bigger questions are:
Answering #2 will also have answer the appropriate type of model to use.
It actually hard to explain, my goal is quite specific. Short answer is the first count data is dependent on the second, as I'm interested in the part of variance from matrix A, which are independent of the matrix B, I would like to normalize the matrix A against B.
OK, well then doing something like the following in DESeq2 may be the easiest:
Note that
dds
is aDESeqDataSet
object. You can do something similar in edgeR and limma.I should mention that using matrix B directly as I showed may be the opposite of what you want. This way will essentially bias estimates toward values with high matrix B values. If you instead want the exact opposite of this then you'll need to modify matrix B accordingly before using it (still divide it by its mean).