Hello everyone,
again I am struggling with getting TCGAs data and description together. As the title says, it's about CNV data by SNP Arrays.
Reading the wiki page SNP array-based data it says,
Level 3 data describes regions of the genome that seem to have segmental duplications or deletions in the tumor compared to the normal sample for the patient
To me this sound like the relative CNV values have been calculated for paired (Tumor and Control/Normal) samples. But then I am asking myself, why are there normal samples available? In the case of somatic mutations, tumor and matched normal files do not differ (read here), but here they do. Why are there matched normal samples available, if they already have been used for calculations, respectively why do case and control differ?
With all the best,
Mario
Exactly. And for those which are controlled with a normal sample, the analysis pipeline is not clear to me.
Let's say for one patient the region chr1:1000-1200 has a seqMean of 0.2 in the normal and 0.7 in the tumor sample. What do I get when download the T and N sample? Reading SNP array-based data, I should get the relative expression, e.g. relativeExpression(N, T) = relativeExpression(0.2, 0.7). But then, why do they still provide the control? And to come back to the somatic SNP example, they solved this issue by providing two identical files for one patient, here the T and N files are the same.
Long story short: What data do I get in each file.