Question

TCGA/Broad Institute CNV Files Segment Mean

2

Entering edit mode

10.2 years ago

dirigible2012 ▴ 320

Hello everybody,

I am trying to analyse CNV data from TCGA to get a measure of overall CNV per patient.

When I download the Level 3 files taken from the SNP6 array, there is a column in the file called Segment_Mean. (Example at bottom.)

What do the numbers in this column represent?

I think they might be log ratios, but the link below makes me wonder if they are direct estimates of copy number. (In which case it is puzzling that they aren't whole numbers.)

http://www.broadinstitute.org/cancer/software/genepattern/modules/snp6copynumberpipeline

Thanks for any help,

Stephanie

Sample    Chromosome    Start    End    Num_Probes    Segment_Mean
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    151040529    153927851    1558    0.2031
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    153928595    153929981    2    -2.0772
BONZE_p_TCGAb56_SNP_1N_GenomeWideSNP_6_E04_666936    1    153933585    164456865    7473    0.1883

tcga cnv • 16k views

ADD COMMENT • link updated 12 months ago by sativus ▴ 20 • written 10.2 years ago by dirigible2012 ▴ 320

0

Entering edit mode

7.6 years ago

Zayni1234 • 0

I have a question , if you please can reply :

to convert BUBBY_p_TCGA_b89_105_SNP_N_GenomeWideSNP_6_D10_777410 > TCGA-2H-A9GF-01A-11D-A37B-01

we have to do it manually before running GISTIC?

thanks

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 7.6 years ago by Zayni1234 • 0

0

Entering edit mode

7.5 years ago

kingsire • 0

I am also wondering how you converted sample ID such as FLOUT_p_TCGAb60_SNP_N_GenomeWideSNP_6_C05_681024 to TCGA barcode ID such as TCGA-2H-A9GF-01A-11D-A37B-01 which is essential for the next analysis. could you please tell your way to solve this? thanks a lot

ADD COMMENT • link updated 5.0 years ago by Ram 44k • written 7.5 years ago by kingsire • 0

Ram · Accepted Answer · 2014-09-11

13

Entering edit mode

10.2 years ago

Chris Miller 22k

Those are the log2 ratio of the tumor intensity to the normal intensity. To convert to an absolute cn, use: (2^seg_mean)*2

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Chris Miller 22k

0

Entering edit mode

Thanks for the information.

May I ask that why should we also "*2 (multiply by 2)" in the "(2^seg_mean)*2", instead of just "2^seg_mean"? Dose this "2" represent the normal intensity?

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.5 years ago by Xinsen Xu ▴ 30

0

Entering edit mode

Right - the assumption is that the normal genome is diploid.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.5 years ago by Chris Miller 22k

0

Entering edit mode

If they are truly log2 ratios of the tumor CN to the normal CNs, how can it be that I see the following in the TCGA ACC cohort?

                         sample chromosome start      end num_probes segment_mean
1: TCGA-2H-A9GF-01A-11D-A37B-01          1 61735 15024591       7600       0.0713
2: TCGA-2H-A9GF-11A-11D-A37E-01          1 61735 17217907       8841       0.0124

The second line is supposed to represent the matched healthy normal (11A denotes healthy normal tissue) of the same donor as the first line. Per your definition, shouldn't this line indicate 0? Against what is this sample compared to compute the segment_mean compared here?

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by Maarten Slagter ▴ 110

0

Entering edit mode

You'll have to consult the metadata or description to see exactly how the files you're consulting were generated. There are ways of doing CN calling against a reference pool of samples as well. There may also be other files in that dump of data that contain the matched T/N data.

ADD REPLY • link 9.0 years ago by Chris Miller 22k

0

Entering edit mode

I had the same question....

ADD REPLY • link 7.7 years ago by Ming Tommy Tang ★ 4.5k

0

Entering edit mode

Could you please tell me how to transform such segment files into gene-level copy number variation files？

ADD REPLY • link 3.3 years ago by bioinformaticsBoy ▴ 10

0

Entering edit mode

Chris Miller , I have segment mean from methylation array ( when performing copy number analysis) and want to use this with gistic tool which required "Seg.CN (log2() -1 of copy number)" column in the main input. Wondering if this column can be used directly since I will get the original values. I am not sure, copy number mentioned in the gistic documentation is absolute CN.

ADD REPLY • link 2.2 years ago by Thind amarinder ▴ 340

0

Entering edit mode

I am in a similar pinch through working with the conumee package.

But one should at least in theory be able to first convert the segment mean into absolute cn through the methodology described by Chris. And then simply follow the Gistic documentation (i.e. to convert the absolute cn through (log2(absCN) - 1)). As you have noticed, this is mathematically the same, and as such the column should be able to work as input in its "native" state.

ADD REPLY • link 12 months ago by sativus ▴ 20