Normalising and splitting TCGA data into train and test sets

0

Entering edit mode

3 months ago

RajSehmbi • 0

Hello all,

I am currently working on a project where I would like to use RNA-seq data as inputs into PCA and neural network autoencoders, but I have never used this type of data before. I've downloaded the GDC TCGA sets from XenaBrowser (separately for each cancer type, not the pan-cancer one). I downloaded the raw counts one that has been log transformed.

I have already combined all the data into one big dataset, have undone the log transformation, and performed an 80-20 split (stratified).

From here, I would like to use the DESeq2 normalisation and vst on the training set, and then use the fitted values to normalise and transform the test set. How can I do this, and does it make sense for me to do so in my use cases?

I also would prefer to use the vst wrapper function as the base 'variancestabilisingtransformation' function takes far too long with the number of samples I have.

RNA-Seq vst DESeq2 TCGA • 246 views

ADD COMMENT • link 3 months ago by RajSehmbi • 0

Login before adding your answer.