GSEA with TCGA data
4
0
Entering edit mode
8.5 years ago
Mike ★ 1.9k

Hello all,

I have TCGA data of EGFR gene (two column file: sample_ID and expression_value) and I want to implement gene set enrichment analysis using GSEA. How can I use this input file in GSEA tool to see the enrichment in different genesets.

Thanks

genome • 6.4k views
ADD COMMENT
0
Entering edit mode

Perhaps I am misunderstanding, but if you have data on only one gene, you will not be able to do gene set enrichment analysis. Could you clarify what you want to do and what data you have?

ADD REPLY
0
Entering edit mode

Actually I have expression data of EGFR from TCGA, I divided this data into two class "Low" and "High" on the basis of expression value, Now I want to see the gene set enrichment analysis of EGFR in low vs high.

I have following two files:

file 1: exp.gct

1.2

1 400

NAME TCGAsample1 TCGAsample2....... ..sample400

EGFR 0.7859 7.3675 8.0040 ......

file 2: exp.cls

400 2 1

low high

low low low.....

ADD REPLY
1
Entering edit mode
8.5 years ago

Gene set analysis needs to have a set of genes. It isn't possible to perform gene set analysis on the expression data of only one gene. The typical process is to have two groups of samples (could be your EGFR HI/LOW groups), perform differential expression on ALL genes, and then do Gene Set Analysis on the resulting differentially-expressed genes.

ADD COMMENT
1
Entering edit mode
8.5 years ago
TriS ★ 4.7k

if I get your question right...you divided the patients in EGFR high group and EGFR low group. this means that you have expression levels for ALL genes in these two groups. you are now trying to use GSEA to evaluate the enrichment of some signatures (i.e. HALLMARK or KEGG or whatever) and see if there is a difference between the two groups that you created. correct?

if that's the case, yes you can/could, but I think there are a couple of caveats. GSEA itself was designed for microarray data while you have RNASeq data (I'd guess). you can normalize/analyze your data for GSEA as described here. there is also a Bioconductor package called SeqGSEA that might be closer to what you look for.

personally I think that if you normalize and transform your data correctly (and don't use FPKM) you should be fine using those data as input for GSEA.

hope this helps

ADD COMMENT
0
Entering edit mode
8.5 years ago
Mike ★ 1.9k

Thanks Sean,

sorry I forgot to mention that I have third file also (C2: curated gene sets downloaded from msigdb)

But how can I perform differential expression for all gene basis of EGFR HI/LOW.

Is it possible or not?

Thanks

ADD COMMENT
0
Entering edit mode
8.5 years ago
Mike ★ 1.9k

Thanks TriS,

Yes absolutely you got my question, my data is mRNA Expression z-Scores (RNA Seq V2 RSEM), and I divided tumor samples based on high EGFR and Low EGFR, BUT I not included all genes, I have only EGFR gene. Is it possibe? or I should include all genes.

Thanks,

ADD COMMENT
1
Entering edit mode

first things first. use the add comment function.link when replying, unless you are actually replying to the main question :)

if for EGFR genes you mean genes that are involved in the EGFR pathway then you don't need to do any functional enrichment analysis because you already know your genes are involved in the EGFR pathway (ok, maybe a few more too). the (very) general point of something like GSEA is to understand what the genes that change the most do and in which direction the pathway goes. this means that you start from a genomewide experiment, not from a handful of genes. therefore no, I wouldn't use GSEA only on the EGFR genes.

ADD REPLY
0
Entering edit mode

Thank you so much... TriS, Yes you are right .

So first I should include all genes and divide samples based on high/low EGFR, then use GSEA.

ADD REPLY
1
Entering edit mode

That is probably the way to go, yes. As I mentioned, there will need to be a differential expression test to get a ranked list of genes.

ADD REPLY

Login before adding your answer.

Traffic: 1770 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6