Hi all,
The TCGA did offer us a great resources for scientific research.
As a golden standard in cBio output, they will take the CN state, such as -2, -1, 0, 1, 2 to present the amplification/deletion/netrual states for genes within the CNV regions. Two tools, RAE and GISTIC2, are applied to calculate the "lessions".
To generate the tumor-over-normal normalized log2ratio intensities, they will take the CN intensities from the HapMap samples under the same platform, e.g., genome-wide SNP 6.0. The first step of RAE is tol calculate those values for tumors, matched normals, and another normal HapMap samples, then will calculate the tumor-over-normal values by substracting the tumor/matched normal sample at the median values of HapMapB samples. And then DNAcopy will be used to calculate the segmented values, which can be accepted by RAE. My question is why not take the tumor-over-normal intensities by tumor-median(matched normal), rather the matched normal samples are just considered as quality control to define the final ROI (region of interest).
Another question is that for the level 2 CNV (SNP array or aCGH) data in TCGA, what does the log2ratio intensities stand for? for tumor samples only or tumor-over-normal samples? If it stands for the latter one, which normal reference they used? Is there any wiki or references to declare this?
Thanks.
Best.