But when I had a look to the section DNA Copy Number and did download the set CCLEcopynumberbyGene_2010-10-28.txt.gz I did not get the data I was expecting.
Indeed I was expecting integer values like it is provided in the file TCNcelllines_120310.xls by the Sanger Institute.
So if someone could have a look at it and explain me why it is not integer values that are provided. And actually the values are strange to me because I don't understand their meaning.
I am eager to learn the piece information I have missed.
Regards,
Fred
Below is the screen-shot of the file (Larry thanks for the suggestion):
Fred, it would be very helpful if you could reproduce here a portion of that download. I'm wondering if the Broad file contains CGH hybe data which would then need to be analyzed to give a ratio. That ration is often near an integer but could be 1.8 to 2.2 for 2 copies of a gene/region.
Those are going to be log2 copy number values for each gene, almost certainly after the data is segmented. If you want to roughly calculate the absolute copy number at that position, you can convert out of log2 and round to the nearest integer:
So for one of your values:
log2 cn: 0.4194
This is the ratio between tumor and normal (or between your cell line and a panel of "normal" cells)
To convert to absolute copy number, we do
(2^0.4194)*2 = 2.674742
The multiplication by two is because we assume a diploid genome in the normal.
Rounding, you'd probably say this gene was duplicated and there are three copies.
This is not the ideal way to calculate absolute copy number, but given the information you have, I think it'll be about as good as you can get.
It is is difficult to read the table, but what I think I see makes me believe that these are log(2) values of the CGH array data. Does that seem to make any sense to you?
Fred, it would be very helpful if you could reproduce here a portion of that download. I'm wondering if the Broad file contains CGH hybe data which would then need to be analyzed to give a ratio. That ration is often near an integer but could be 1.8 to 2.2 for 2 copies of a gene/region.