I downloaded expression (microarray), clinical and mutation data associated with METABRIC breast cancer study from cBIOPORTAL and I'm planning to analyze the dataset for DEGs among three subsets of individuals, subsetted based on a mutational status. I'm not familiar with using median and z-score expression values for downstream analysis. I was initially planning to set up a SummarizedExperiment object and use DESeq2 to analyze the data. Since these expression values are already normalized, I believe can't go down that path. Can I just run ANOVA (Kruskal-Wallis test) on expression values for each gene from three samples/subsets?
Any help on understanding what these median and z-score expression files exactly are, and how I may proceed with the analysis is much appreciated.
Thanks, Kevin. I checked the distribution of data for each test category and they looked normally distributed.I did ANOVA and Kruskal-Wallis ANOVA on z-scores. Just wanted to make sure that I didn’t missing any. Also, thanks again for your answer on implementing K-W and parametric ANOVA in R. It was very helpful.
Sure thing.
Any idea on what the median data file that comes along with the download? Thanks!
Can you clarify what the file-name is?
It says data_expression_median.txt and a representative summary of data distribution for a sample looks like below;
Min. 1st Qu. Median Mean 3rd Qu. Max. 4.713 5.401 5.659 6.423 7.095 14.464
whereas for the same sample in the z-score file (data_mRNA_median_Zscores.txt) summary looks like below. Min. 1st Qu. Median Mean 3rd Qu. Max. NA's -4.1048 -0.6847 -0.0855 -0.0321 0.6074 11.4517 1006