limma voom normalization of train and test set
0
0
Entering edit mode
4.2 years ago
asalimih ▴ 60

Hello,
I am using TCGA RNA-Seq data for a neural net method. I have splitted the data to train and test set. how do i normalize count data of train and test set without data leakage using limma?
My Idea is I might be able to normalize the train set using limma+voom method first and then use the parameters calculated by voom on train set(E and weights) to normalize the test set. however I couldn't realize how?

any help would be greatly appreciated.

limma machine learning RNA-Seq data leakage • 1.5k views
ADD COMMENT
0
Entering edit mode

I'd normalize them separately. Simply make two matrices, one test and one train, and run the normalization on each of them.

ADD REPLY
0
Entering edit mode

thank you. after I searched more it seems that using E and weights is not correct for this purpose. those parameters are for lmfit and limma itself. Gordon Smyth recommended this:

I always recommend cpm(counts, log=TRUE, prior.count=3) for the purpose of other down-stream analyses, because the voom quantities cannot be summarized in single combined quantity. link

ADD REPLY
0
Entering edit mode

Yes, just run the default normalization as in the manual, so calcNormFactors and then cpm() using the log-transformation. Alternatively something like vst() from DESeq2 might be of interest here.

ADD REPLY

Login before adding your answer.

Traffic: 2775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6