Hello everybody,
I am trying to analyse CNV data from TCGA to get a measure of overall CNV per patient.
When I download the Level 3 files taken from the SNP6 array, there is a column in the file called Segment_Mean. (Example at bottom.)
What do the numbers in this column represent?
I think they might be log ratios, but the link below makes me wonder if they are direct estimates of copy number. (In which case it is puzzling that they aren't whole numbers.)
http://www.broadinstitute.org/cancer/software/genepattern/modules/snp6copynumberpipeline
Thanks for any help,
Stephanie
Sample Chromosome Start End Num_Probes Segment_Mean
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936 1 151040529 153927851 1558 0.2031
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936 1 153928595 153929981 2 -2.0772
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936 1 153933585 164456865 7473 0.1883
Thanks for the information.
May I ask that why should we also "*2 (multiply by 2)" in the "(2^seg_mean)*2", instead of just "2^seg_mean"? Dose this "2" represent the normal intensity?
Right - the assumption is that the normal genome is diploid.
If they are truly log2 ratios of the tumor CN to the normal CNs, how can it be that I see the following in the TCGA ACC cohort?
The second line is supposed to represent the matched healthy normal (11A denotes healthy normal tissue) of the same donor as the first line. Per your definition, shouldn't this line indicate 0? Against what is this sample compared to compute the segment_mean compared here?
You'll have to consult the metadata or description to see exactly how the files you're consulting were generated. There are ways of doing CN calling against a reference pool of samples as well. There may also be other files in that dump of data that contain the matched T/N data.
I had the same question....
Could you please tell me how to transform such segment files into gene-level copy number variation files?
Chris Miller , I have segment mean from methylation array ( when performing copy number analysis) and want to use this with gistic tool which required "Seg.CN (log2() -1 of copy number)" column in the main input. Wondering if this column can be used directly since I will get the original values. I am not sure, copy number mentioned in the gistic documentation is absolute CN.
I am in a similar pinch through working with the conumee package.
But one should at least in theory be able to first convert the segment mean into absolute cn through the methodology described by Chris. And then simply follow the Gistic documentation (i.e. to convert the absolute cn through (log2(absCN) - 1)). As you have noticed, this is mathematically the same, and as such the column should be able to work as input in its "native" state.