Entering edit mode
4.5 years ago
lenC_biotecLover
▴
90
I have a matrix with different miRNA RPKM values downloaded from TCGA, relatively to different TCGA projects (BRCA, LAML, LUAD ecc.) columns: TCGA-barcodes, rows: miRNa identifier.
In order to perform a machine learning analysis how can I normalize all this data between the patients in my matrix? I searched all around the web but I couldn't find any answer.
I'm really a novice in bioinformatics and computational biology, and any advice is strongly appreciated. Thank you very much.
I know, but I meant between the patients, considering that I've data from different projects
You can convert rpkm to
log scale
and performvst
Thank you, after this, when I have the vst normalized data (using the DEseq2 package, isn't it?), it is the same of having counts data transformed using the same
vst
function?. For instance, if I have a RPKM dataset converted using firstlog scale
thenvst
and also a counts dataset normalized with thevst
function, are they comparable in terms of normalization? Thank you very much@dare_devil, Ok I tried but log scaled RPKM are also negative in some cases and the
vst
function doesn't work on negative values. How can I handle with this?You should have a matrix of RPKM values greater than or equal to 1. In order to achieve this you can add 1 to entire data frame then convert to log scale to avoid negative values.
Thank you.
Now the problem is that I downloaded some data from GEO (Tumoral Breast vs Normal Breast samples), in particular this is the code: GSE68085, I suppose that data is already log2 normalized and some negative values are in it. I want to use this data as a validation dataset (I'm using an svm classifier): I've downloaded the series matrix and I used the batch ID information for the batch correction with
comBat
function. Should I do the inverse exponential function and then applyvst
?Thank you very much again.
In this case, I would suggest
nneg
inNMF
packageThis will convert all negative values to
0
You can go through this link for other methods
You can convert the log2 scaled data to their corresponding RPKM values using inverse function. I looked at your data
GSE68085
. But, I don't think they are log transformed valuesThank you! Ok, but these data is described as "normalized" I can't understand what type of normalization they did, do they just refer to RPKM? And if so, why do we have negative values? I red the series matrix and I could not find any other useful info. Thanks again.
You can download the data and redo the analysis. You can find its raw data here for download