Question

Log expression for machine learning input

0

Entering edit mode

9.3 years ago

bharata1803 ▴ 580

Hello,

I have processed my read count data from RNA-seq with both limma/voom and DESeq2 method. After that, I plotted the log fold change for both methods. The result was okay with most of the genes has "similar" fold change and the plot looks roughly a diagonal line. Now, I want to further analysis the data with some statistical method/machine learning technique. I know I need to use the log expression for each genes as input for basic machine learning method (clustering, regression, etc) and from DESeq2, I can use rlog function which the documentation states that:

This function transforms the count data to the log2 scale in a way which minimizes differences between samples for rows with small counts, and which normalizes with respect to library size. The rlog transformation produces a similar variance stabilizing effect as varianceStabilizingTransformation, though rlog is more robust in the case when the size factors vary widely. The transformation is useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis*

My question is, which is better ones, using limma/voom log expression (I think the output of voom) or DESeq2? I'm not really familiar with how voom processed the transformation from raw count to log expression. For DESeq2, it seems the transformation is simpler because it doesn't use weighting for precision like in voom.

limma rna-seq deseq2 • 3.2k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.3 years ago by bharata1803 ▴ 580

0

Entering edit mode

BTW, I guess you are confused between rlog and log fold change. Both are not same. Log fold change is something what you get after Differential expression analysis. You need normalized counts or transformed values for exploratory data analysis.

ADD REPLY • link 9.3 years ago by GouthamAtla 12k

0

Entering edit mode

Sorry if I wrote it confusingly. Basically, what I do is comparing voom/limma and DESeq2 method. I compared both results with plotting log fold change with each other. My conculsion is both DESeq2 and limma/voom results are same. In that case, I have 2 options here, to use DESeq2 data or limma/voom data. I want to work with some statistical method like correlation, etc. I know that I need to work with log transform for gene expression. If I want to use DESeq2 data, I know I need to use the output of rlog function. If I want to use limma/voom data, I know I need to use the output of voom method. With those 2 options, I just want to know which is more suitable for further statistical analysis, from voom or rlog, from the point of view of how the log transform is calculated.

ADD REPLY • link 9.3 years ago by bharata1803 ▴ 580

0

Entering edit mode

Basically if am not wrong you need to extract the expression values with which are normalized values for your genes across different genes across different samples. You can extract them with DESeq2 and limma. Then you will have a matrix of genes having expression values across different samples and that you can use for downstream machine learning process. However keep in mind that for machine learning you need a large set of samples so am not sure if you have more than 100/200 samples. Only then you can use the unsupervised or supervised methods. Since you need a set of test and training data sets.

ADD REPLY • link 9.2 years ago by ivivek_ngs ★ 5.2k