Normalising and splitting TCGA data into train and test sets
0
0
Entering edit mode
3 months ago
RajSehmbi • 0

Hello all,

I am currently working on a project where I would like to use RNA-seq data as inputs into PCA and neural network autoencoders, but I have never used this type of data before. I've downloaded the GDC TCGA sets from XenaBrowser (separately for each cancer type, not the pan-cancer one). I downloaded the raw counts one that has been log transformed.

I have already combined all the data into one big dataset, have undone the log transformation, and performed an 80-20 split (stratified).

From here, I would like to use the DESeq2 normalisation and vst on the training set, and then use the fitted values to normalise and transform the test set. How can I do this, and does it make sense for me to do so in my use cases?

I also would prefer to use the vst wrapper function as the base 'variancestabilisingtransformation' function takes far too long with the number of samples I have.

RNA-Seq vst DESeq2 TCGA • 246 views
ADD COMMENT

Login before adding your answer.

Traffic: 2039 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6