Hi everyone,
I have done research on using single and integrative analyses on -omic datasets, including gene expression (EXP), copy number alteration (CNA), and methylation (MET), to classify breast cancer patients. Right now, I coded in R as well as drafted my manuscript already, and sent it out to my supervisor. The problem is that my supervisor asked me two questions, but truly I do not know how to answer well:
Summarizing clearly the used data: I used two data: discovery data and validation data, having the same characteristics as followings (AVAILABLE in the data description file):
EXP data are continuous variables using Z score transformation, range of values is between -6 and 11
CNA data are discrete variables generated using GISTIC or RAE algorithms range of values is between -2 and 2 (-2 = high deletion, -1 = deletion, 0 = copy-neutral, 1 = amplification, and 2 = high amplification)
MET data are discrete variables, range of values is between 0 and 1 (0 = hypo-methylated and 1 = hyper-methylated)
Below is my supervisor's questions:
1, Gene expression and methylation data are normalized?how? CNA data are defined gain and loss a threshold has been imposed?.
--> Because I did not normalize the used data at all, I do not know whether or not I should have done that (I googled and realized that maybe I should do log-transformation like log2 or log10) or the above existing characteristics are accepted to do my research?
2, Like I said, I performed single and integrative analyses on -omic datasets to stratify cancer patients. Then, both single and integrative analysis found two patient subgroups. I continued to implement survival analysis on those two groups, and the results are that P-value = 4E-04 and 1E-04 for single clustering on CNA and MET, respectively, and P-value = 0.002 for integrative clustering on EXP, CNA and MET. Then, My supervisor asked me:
Sub groups from single clustering analysis seems to have stronger relation with OS when compared to integrated clustering analysis (by comparing the p-vlaue, as the HR for single cluster analysis is not shown). What are the justification for this?