Question

Problem To Understand Copy-Number Values Per Gene Provided By The Broad Institute

4

Entering edit mode

13.1 years ago

Fred Fleche 4.3k

Dear all,

I wanted to play with and integrate gene copy number data provided by the Broad Institute and available at : http://www.broadinstitute.org/ccle/data/browseData?conversationPropagation=begin

But when I had a look to the section DNA Copy Number and did download the set CCLEcopynumberbyGene_2010-10-28.txt.gz I did not get the data I was expecting.

Indeed I was expecting integer values like it is provided in the file TCNcelllines_120310.xls by the Sanger Institute.

So if someone could have a look at it and explain me why it is not integer values that are provided. And actually the values are strange to me because I don't understand their meaning.

I am eager to learn the piece information I have missed.

Regards,

Fred

Below is the screen-shot of the file (Larry thanks for the suggestion):

alt text

copynumber data • 15k views

ADD COMMENT • link updated 13.1 years ago by Chris Miller 22k • written 13.1 years ago by Fred Fleche 4.3k

0

Entering edit mode

Fred, it would be very helpful if you could reproduce here a portion of that download. I'm wondering if the Broad file contains CGH hybe data which would then need to be analyzed to give a ratio. That ration is often near an integer but could be 1.8 to 2.2 for 2 copies of a gene/region.

ADD REPLY • link 13.1 years ago by Larry_Parnell 16k

Ram · Answer 1 · 2011-10-26

10

Entering edit mode

13.1 years ago

Chris Miller 22k

Those are going to be log2 copy number values for each gene, almost certainly after the data is segmented. If you want to roughly calculate the absolute copy number at that position, you can convert out of log2 and round to the nearest integer:

So for one of your values:

log2 cn: 0.4194

This is the ratio between tumor and normal (or between your cell line and a panel of "normal" cells)

To convert to absolute copy number, we do

(2^0.4194)*2 = 2.674742

The multiplication by two is because we assume a diploid genome in the normal.

Rounding, you'd probably say this gene was duplicated and there are three copies.

This is not the ideal way to calculate absolute copy number, but given the information you have, I think it'll be about as good as you can get.

ADD COMMENT • link updated 4.2 years ago by Ram 44k • written 13.1 years ago by Chris Miller 22k

1

Entering edit mode

It was the end of a long day, Chris, when I answered this and neglected to first read your more comprehensive answer, +1.

ADD REPLY • link 13.1 years ago by Larry_Parnell 16k

0

Entering edit mode

Just to update this: there is no better documentation, here: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/

ADD REPLY • link 5.1 years ago by Kevin Blighe 88k

score 1 · Answer 2 · 2011-10-25

1

Entering edit mode

13.1 years ago

Larry_Parnell 16k

Fred,

It is is difficult to read the table, but what I think I see makes me believe that these are log(2) values of the CGH array data. Does that seem to make any sense to you?

ADD COMMENT • link 13.1 years ago by Larry_Parnell 16k