Entering edit mode
8.3 years ago
Angel
▴
220
Hi All,
I have three gene expression signatures and they correspond to three different subtypes. I have a rnaseq data and need to use the three gene signatures to call the three subtypes defined by three gene signatures. What methodologies exist?
Naively i can think of just doing a gsea and based on the score of the gsea, call a sample one subtype and not another. But it's not systematic and can lead to overlaps e.g. for two gene signatures, a single sample can have high score for each if i look at them individually for each sample.
Is there an R code or other knwon methods to handle this situation??
What exactly is a subtype? Subtype of what? If I understand your question: you have three gene expression signatures - which means essentially that you have 3 vectors of numbers. You then have additional data. Each additional data set can be classified on whether or how well it matches any of the three vectors (signatures). One solution to this is to measure a distance between the three profiles, and each of your data sets of interest. Each of your data sets would then be classified as a given subtype based on whichever of your signatures it resembles most. Your gene expression signatures: are they one set of genes with a different pattern based on subtype? Or are they different genes that define each subtype? There's an R library called limma which has a function called geneSetTest which might allow you to determine for a given set of genes which data sets show evidence of differential expression (sort of like GSEA). On the other hand, if your "signatures" are a pattern, you could use a simple measure such as correlation or the dist() function to compare the patterns to your RNA Seq data sets and determine a distance for each signature.