GSEA background list
1
0
Entering edit mode
9.5 years ago
insertname • 0

I am having trouble understanding the GSEA workflow

Say you have a VCF file with variations and would like to determine if a gene set associated to some disease is enriched in your gene list.

Would it make sense to define the gene list as genes found to match variations associated to the disease, and the background as all of the variations?

Is that even a valid application?

I am new to bioinformatics, so sorry if my question seems obvious

Thank you :)

GSEA • 3.7k views
ADD COMMENT
0
Entering edit mode
9.5 years ago
ethan.kaufman ▴ 380

What you're proposing doesn't sound like GSEA. Without going into too much detail, GSEA refers to software first published here, and requires a gene list that is comprehensive (a superset of any gene set that you may want to test) and ordered by some numerical variable (usually fold change in expression between two states).

In your case, your gene list is not ordered (a set of mutated genes) and not comprehensive (not every gene will have a mutation). Your question is simply whether some gene set (say, genes with OMIM label "breast cancer"), is enriched in your "list" of mutated genes. Well, this can be answered with a straightforward statistical test: Fisher's Exact Test, which can be done with a calculator, or using excel, R, or with an online tool like DAVID. This test requires a "background list" which is the list of genes you had the potential to find a mutation in. The simplest background list would just be all genes in the genome. However, the background should be carefully considered before applying the test because it is often the source of mistakes. For example, were all genes sequenced to sufficient depth that a mutation would have been found had one existed? If not, then those genes should be excluded from the list.

The purpose of the test is to compare the gene list and the background list for the fraction of genes in the list that are in the gene set. If enrichment is present, then the fraction should be higher for the gene list than for the background list.

ADD COMMENT

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6