Hello Biostars
I wanted to tackle the TCGA CGH (comparative genomic hybridization) data from their Glioblastoma Project. I was surprised by the fact, that they would compare their tumour and normal samples separately to a reference genome! Nevertheless, for a few cases they actually put the tumour and the normal sample on the same chip, what allows a direct comparison of gains and losses between normal and tumor.
So I can get something like this:
barcode chromosome start stop num.mark seg.mean (log2)
tumour:TCGA-06-0238-01A-02D-0311-04 3 85466920 85652956 24 -0.7886
normal:TCGA-06-0238-10A-01D-0311-04 3 85458029 85652956 25 -0.7479
Given their high similarity i can assume that their is no loss in the tumour compared to the normal CGH
But I am no quite sure how i should treat the cases where segments are very different in size, the difference between the values are bigger, there is a loss/gain in the normal, but not in the tumour, etc.
I am tempted to just map the segments to genes, and make a substraction, but since they are two different experiments, I doubt this is correct.. any suggestions / thoughts?
could you specify which paper explains the reason? I'm also interested to know.
"Actually, reading more carefully supplementary material, I found this paper, which explains why they did not match the normal and tumour samples, and how they analysed it accounting for noise (as Jan Oosting said)"
How do you get calculate gains or losses from these TCGA data? I do not know much about SNP array data analysis, your help will be appreciated.