Hi friends Hope you all doing well.
I want to validate my TCGA analysis with CPTAC transcriptomic data. I don't know why in the validation, my model classifies all patients as one phenotype (I expected two: with and without a feature).
Do you guys have any idea why this is happening? Do you also know what the CPTAC data is? is it z-score? I transformed my TCGA data to z-score to be consistent with CPTAC data.
hello,
could you explain more about what TCGA are you using? feel free post 3-5 rows and select the features and label(s)? to see the possible binary classification machine learning project.
Thanks @Ernest for responding. I used HT seq raw count data of TCGA. I normalize and the calculate Z-score. then I make model and I apply the best model (classifier) for CPTAC data to validate my work.
This is my TCGA data after normalization and converting to Z-score:
thanks @Rob, i understand that data scaling (normalization) is the next step after data split in Machine Learning project workflow, but why the need to calculate the z-score? Can you share the link where you download the HT seq raw count data of TCGA? Feel free to read the following blog paper: Apply Machine Learning Algorithms for Genomics Data Classification.