Oncotype DX 21-gene panel is a popular panel for breast-cancer recurrent prediction and has been cited ~thousands times (See DOI: 10.1056/NEJMoa041588 ). What confused me a lot is the first step of expression level normalization for 16 cancer-related genes. Almost every literature wrote:
For each sample, normalised expression measurements are calculated as the mean cycle threshold (CT) for the 5 reference genes minus the mean CT of triplicate measurements for each individual gene. Normalized expression measurements are scaled from 0 to 15 units, where 1 unit reflects an ~2-fold change in RNA quantity
Why 0~15? Suppose the mean CT values for 5 reference genes is 28, and the CT for target gene MKI67 is 32. How can I calculate the normalized expression level for gene MKI67 ?
I believe the mentioning of the reference-normalised values is more of a finding and that it's not that they have scaled those values to the 0-15 range. That is, after they have subtracted the mean of 3 replicates for each gene of interest from the mean of the 5-panel reference, they made the observation that values ranged between 0-15. They state:
The mentioning of 'doubling' of RNA is purely related to the fact that each cycle difference in PCR reflects a duplication / copying of the original cycle's content.
In situations where a negative reference-normalised value is found, the values can be shifted by a specific factor in order to bring the range to have min=0. The eventual range for your particular data may eventually be 0-8 o 0-23, etc.
------------------------------------------
Note, then, that these reference-normalised values are then used to produce the recurrence score (RS), which is on the range 0-100. As the method is proprietary, they do not go into the stats. However, I imagine that we're talking about a simple regression model here, with the 16-gene panel's reference-normalised values as predictors and some marker of relapse as outcome. Predictions from regression models are on the scale 0-1 (or 0-100%).
Kevin
Thank you Dr. Blighe, Could you please explain it more clear? What is the value of the specific factor? Is the value different for each sample? Thank you so much!
Hey Qian. Thanks for the question. It is just a 'range scaling', whereby they 'compress' or 'expand' the range of the data.
Generate random data
Scale between 0 and 1
Scale between 0 and 15
Thank you Kevin. 'compress' or 'expand' the range of the data? in the original paper: N Engl J Med. 2004 Dec 30;351(27):2817-26.
Any comments? Thanks a lot!
Sorry, I did not answer about the factor value: if, for example, the range of values is -5 to +10, then we can add a factor of
+5
to the values, which will bring the range to 0 to 15.Without the authors showing the actual formula that they used for this 'range', I will not hypothesise further what they did. However, as per my original comment ( above - C: Oncotype DX 21-gene panel: How to do qRT-PCR normalization? ), it could be that the Ct values from their PCR experiment were already measured in the range 0 to 15. On the Ct value scale, each unit increase reflects a doubling of RNA.
I do not know much about this - The authors should state from where they derived these values.