Hello all,
I am currently working on a project where I would like to use RNA-seq data as inputs into PCA and neural network autoencoders, but I have never used this type of data before. I've downloaded the GDC TCGA sets from XenaBrowser (separately for each cancer type, not the pan-cancer one). I downloaded the raw counts one that has been log transformed.
I have already combined all the data into one big dataset, have undone the log transformation, and performed an 80-20 split (stratified).
From here, I would like to use the DESeq2 normalisation and vst on the training set, and then use the fitted values to normalise and transform the test set. How can I do this, and does it make sense for me to do so in my use cases?
I also would prefer to use the vst wrapper function as the base 'variancestabilisingtransformation' function takes far too long with the number of samples I have.