Problem To Understand Copy-Number Values Per Gene Provided By The Broad Institute
2
4
Entering edit mode
13.1 years ago

Dear all,

I wanted to play with and integrate gene copy number data provided by the Broad Institute and available at : http://www.broadinstitute.org/ccle/data/browseData?conversationPropagation=begin

But when I had a look to the section DNA Copy Number and did download the set CCLEcopynumberbyGene_2010-10-28.txt.gz I did not get the data I was expecting.

Indeed I was expecting integer values like it is provided in the file TCNcelllines_120310.xls by the Sanger Institute.

So if someone could have a look at it and explain me why it is not integer values that are provided. And actually the values are strange to me because I don't understand their meaning.

I am eager to learn the piece information I have missed.

Regards,

Fred

Below is the screen-shot of the file (Larry thanks for the suggestion):

alt text

copynumber data • 15k views
ADD COMMENT
0
Entering edit mode

Fred, it would be very helpful if you could reproduce here a portion of that download. I'm wondering if the Broad file contains CGH hybe data which would then need to be analyzed to give a ratio. That ration is often near an integer but could be 1.8 to 2.2 for 2 copies of a gene/region.

ADD REPLY
10
Entering edit mode
13.1 years ago

Those are going to be log2 copy number values for each gene, almost certainly after the data is segmented. If you want to roughly calculate the absolute copy number at that position, you can convert out of log2 and round to the nearest integer:

So for one of your values:

log2 cn: 0.4194

This is the ratio between tumor and normal (or between your cell line and a panel of "normal" cells)

To convert to absolute copy number, we do

(2^0.4194)*2 = 2.674742

The multiplication by two is because we assume a diploid genome in the normal.

Rounding, you'd probably say this gene was duplicated and there are three copies.

This is not the ideal way to calculate absolute copy number, but given the information you have, I think it'll be about as good as you can get.

ADD COMMENT
1
Entering edit mode

It was the end of a long day, Chris, when I answered this and neglected to first read your more comprehensive answer, +1.

ADD REPLY
0
Entering edit mode

Just to update this: there is no better documentation, here: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/

ADD REPLY
1
Entering edit mode
13.1 years ago

Fred,

It is is difficult to read the table, but what I think I see makes me believe that these are log(2) values of the CGH array data. Does that seem to make any sense to you?

ADD COMMENT

Login before adding your answer.

Traffic: 1975 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6