Best way to look for gene signature in RNA-seq dataset
1
0
Entering edit mode
12 months ago
Joseph • 0

Hi all,

I'm studying a pathway that a set of ~20 genes has been proposed to reflect the activity. This gene set is curated from the consensus of several RNA-seq datasets, i.e. these genes are consistently downregulated if the major effector protein is downregulated in these datasets. I would like to apply the gene set to TCGA data. Basically what I want to do is:

  • retrieve TCGA RNA-seq data
  • find a way to classify the samples into "high" or "low" expression of the 20 gene
  • compare the survival between high/low expression

Is there a best way to do that? My current thoughts

  • individually test each gene and classify high/low by median (50% of the samples are high, 50% of the samples are low)
  • for each sample, calculate the number of genes that are high or low expression, thus giving a score. each sample with thus be scored between 0-20
  • classify into high versus low (0-10 or 11-20), compare survival (the sample size might be different)
  • rank by gene score, then arbitually divide into 2 halves by gene score, and then compare (the sample size will be more consistent)

This approach seems pretty primitive but as far as I can tell it makes sense. that's assuming that the 20 genes are euqally contributing to the pathway (so far no evidence to prove otherwise, the 20 genes seem pretty random and not causually related)

Is there any other suggestion? Ideally using either excel or R to do any work (I'm thinking you might do a monte carlo simulation, but not sure if that's the best way to do it or how).

Thank in advance! Happy to clarify any questions if not clear.

RNA-seq gene-signature • 715 views
ADD COMMENT
2
Entering edit mode
12 months ago
ATpoint 86k

Not Excel. Never Excel. Not reproduce, error-prone, does not scale.

You could calculate a signature score. Check the Bioconductor package UCell. Don't get irritated it is often used for single-cell data. It can be used for any dataset. It calculates the score based on rankings of the individual genes.

ADD COMMENT

Login before adding your answer.

Traffic: 1112 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6