The objective of my task is binary classification for the HPV-status of head and neck cancer patients with multi-modal data including genomics, transcriptomics and histopathology data.
For the transcriptomics data, I downloaded mRNA seq files with raw count, and normalised counts with tpm, fpkm, fpkm-uq transformation methods. However, it seems that these normalisation methods are not preferred choices and often raw counts are used directly as inputs for DESeq2 or EdgeR normalisation.
I did some further reading on DESeq2 and EdgeR normalisation methods, but they use the label information - which I would not want as this would require an independent test set etc..
Prior to the feature selection, I would like to apply normalisation for the raw count but I am not very sure still after days of reading which format of RNA-seq data to use. Could anyone give advice on how I can proceed further with this?
Thank you very much.
Thank you very much for sharing your advice. Yes, it definitely makes sense that with the use of deep learning, models would learn normalisation itself present in data. This reply saved a lot of headaches and time! Thank you :)