Question

Comparison of ssGSEA scores across different datasets

0

Entering edit mode

1 day ago

james.zhang20 ▴ 10

Hi everyone,

I would like to try and predict patient sensitivies to a novel drug based on patient molecular characteristics using ssGSEA and I have some questions.

First of all, the experimental set up:

We have previously screened a panel of cell lines against a novel drug to obtain IC50 values and classify these cell lines as sensitive or insensitive. The transcriptomes of these cell lines are available in CCLE, and therefore we have constructed a gene set which predicts sensitivity to this drug very well in the CCLE cell line dataset when looking at the ssGSEA enrichment scores of this gene set.
I would now like to try and predict which patient samples would be sensitive to this drug.

The method I am trying to implement:

First, carry out ssGSEA against the CCLE cell line dataset for which I have IC50 values, and use this dataset to find the optimal ssGSEA threshold which best separates sensitive and insensitive cell lines.
Next, carry out ssGSEA against patient transcriptomic datasets (let's say we have 2 different datasets consisting of 50 and 500 patients with a certain cancer respectively) against my custom gene set.
Using the threshold ssGSEA threshold obtained from the CCLE dataset, classify patients as sensitive or insensitive to the drug. Next, correlate the sensitivity classification with patient clinical/molecular characteristics (e.g. risk, molecular subtype of disease) to find subsets of patients for which the drug is more sensitive in.

The issue: I am not sure how to make ssGSEA across different datasets comparable. The enrichment scores of course vary a lot between datasets, and standard ssGSEA normalisation to obtain NES does not address this issue because the method normalises by the average enrichment scores across all samples only in that dataset. Therefore, for e.g. the CCLE threshold, the range of NES scores may be -1 to 1, whereas for a patient dataset that might be -0.5 to 0.5 and for the other patient dataset, 1 to 2. I am not sure how to make these ssGSEA scores across datasets comparable to each other. I am considering the following:

Normalise the enrichment scores by the average enrichment scores across all samples across all datasets.

I would like to check that this makes sense from a statistic point of view. Given that ssGSEA is run on individual samples, I think this makes sense? I had also considered taking the NES for each dataset and trying to normalise these by e.g. constant subtraction to make the ranges the same, although this seems more arbritary?

ssGSEA GSEA RNA-Seq • 72 views

ADD COMMENT • link 1 day ago by james.zhang20 ▴ 10