Question

Basic idea of applying a gene signature

1

Entering edit mode

10.4 years ago

Avro ▴ 160

Hi everyone,

I am a PhD student in biochemistry, and I am learning about gene expression signature. My lab generated a 36-gene mouse signature. These genes are all highly expressed. I am interested in identifying "mouse-like" human samples from a large set of primary breast tumors.

I was wondering if someone could please give me the general guidelines on how to apply a gene signature. I can write code, but don't understand the principles (I am reading tough). Is it based on the gene names and their fold-change or just the names? I am sorry for asking such a basic question, but I am learning this aspect of bioinformatics. I read that a naive Bayes classifier is a good idea? Alternatively, ranking the samples (based on how well they express the signature) and using bootstrap resampling?

I would also greatly appreciate to be redirected to a former post or tutorial.

Thank you!

gene-signature • 3.2k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 10.4 years ago by Avro ▴ 160

Ram · Accepted Answer · 2014-12-03

2

Entering edit mode

10.4 years ago

Devon Ryan 105k

One possibility (that wouldn't even require writing much code) is to use this signature as a gene set and use GSEA on the human samples to look for samples in which that set is more highly expressed than expected. The general idea is to perform GSEA on a large number of samples, many of which you expect to not show enrichment, and then look at the resulting enrichment score (or p-value) distribution. From that, you should be able to get an idea of whether the expression of this set generally follows a normal distribution or whether there's a bimodal distribution...meaning that there's a subset of samples that you're going to be very interested in. You could alternatively use resampling there, though I think it'll be quicker and easier to just have a look at the distributions first (nothing is preventing you from doing both).

That's one fairly straight forward possibility, though there are others.

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you! I have just started looking at GSEA's documentation. If I have normalized human Illumina HT-12 v3 gene expression (breast tumor vs normal), and a list of the 36 genes, I should be able to run GSEA, right? I am asking because GSEA can be run differently. Thank you once again for your help.

ADD REPLY • link 10.4 years ago by Avro ▴ 160

0

Entering edit mode

Yup, that should work!

ADD REPLY • link 10.4 years ago by Devon Ryan 105k

0

Entering edit mode

Hi, I am a first-year master student in bioinformatics with a bachelor in molecular biology.

I have a question that seems somewhat relevant to the one that was asked here. I have analysed Chip-seq data for 100 transcription factors (TF) of C.elegans by calling targets to each of these factors. Now I have a table with 40k rows (all genes in C.Elegans) and 100 columns (all available TFs), each cell contains a score that reflect how likely given factor affects given gene, so for each TF I have a ranked list of genes. Beside this table I also have ten gene sets of different sizes (from 100 to 1000 genes). It is maybe important to mention that there is no overlap between these gene sets.

The question I seek to answer is which TF is most likely regulate each of the gene sets. I've realized that I can use GSEA here but I can not figure how exactly it should be applied in this case. Maybe I can use some other implementation of Random Walk?

I will appreciate any suggestions and ideas.

Thanks in advance,

Regards
Tim

ADD REPLY • link updated 2.2 years ago by Ram 45k • written 10.0 years ago by Tim Padvitski • 0