Hello
I would like to use the masked copy number segment from TCGA found on Xena browser and correlate it with gene expression values. Both data sets can be found here https://xenabrowser.net/datapages/?cohort=GDC%20TCGA%20Liver%20Cancer%20(LIHC)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443
Xena webtools itself enables this sort of correlation with plausible results - CNV values often do correlate with expression level. On the other hand, I see that in the GISTIC pipeline only genes with a value =< -0.3 or => 0.3 are considered as deleted/amplified.
I see a few other papers that did something similar, but I would like to know if a statistical test like this makes sense and, if so, what sort of expression normalization would be the best.
what exactly is your statistical test?
Pearson correlation, forgot to add
Pearson correlation may be tricked in many ways =) https://en.wikipedia.org/wiki/Anscombe%27s_quartet - would not recommend it (only if you are particularly interested in this one and are sure that the data follows your assumptions)