Question

Doublechecking enrichment of promoters of DE genes

0

Entering edit mode

23 months ago

Aspire ▴ 390

I have performed an enrichment analysis over the promoters of genes that are differentially expressed between two conditions, and there were highly significant results. However, I have seen that random promoters also give significant enrichment results.

Hence, I want to doublecheck that the results I get are different from the enrichment signature of random promoters.

To get a random list of promoters :

I have downloaded the list of all human genes (GRCh38.p14) from Biomart. From these list, I have taken a subset 2000 genes (simply by taking the first 2000 genes by alphabet).

I have selected the promoters for these 2000 genes via EPD, with default settings (no options selected).

https://epd.expasy.org/epd/EPDnew_select.php

3264 promoters were selected, and I have exported a fasta file from -1000 to +100.

This was uploaded to SEA (Simple Enrichment Analysis) from meme-suite.

The results are here https://meme-suite.org/meme//opal-jobs/appSEA_5.5.517041990574241307765695/sea.html

For example Motif

Upon visual expection, there is high concordance between my list, and the random list found motifs.

** If you know the tools, is the default parameters selection reasonable?

** How do I reasonably decide which motifs enriched in my own data are valid, and which are no better than the enrichment results for random genes?

motifs enrichment • 1.5k views

ADD COMMENT • link updated 23 months ago by ATpoint 90k • written 23 months ago by Aspire ▴ 390

2

Entering edit mode

As others have commented, this is an impressive example why background are critical in enrichment analysis. You're currently testing promoter vs genome which of course primarily returns bona fide promoter motifs. Here as background you could use promoters of genes with good evidence to be not differential.

ADD REPLY • link 23 months ago by ATpoint 90k

score 3 · Accepted Answer · 2024-01-02

3

Entering edit mode

23 months ago

rpolicastro 13k

Generally you want to use a tool like MEME with your background model being all plausible promoters (potentially leaving out the experimental set). This will help reduce the occurrence of motifs common to most promoters. You can also try the differential motif option comparing your experimental set against the set of all plausible promoters (optionally minus your experimental set).

ADD COMMENT • link 23 months ago by rpolicastro 13k

3

Entering edit mode

Specifically in this case, you want to select "User Provided Sequences" under "Select the type of control sequences to use". And then upload a file of background promoter sequences under "Input the control sequences". As a starting point, you could use the list of random promoters you downloaded if you can't think of anything better.

ADD REPLY • link 23 months ago by i.sudbery 22k

2

Entering edit mode

Personally, I would use the promoters from the genes that were not differentially expressed but ARE expressed in the model of interest. Depending on the treatment, should give ~3000-7000 genes.

ADD REPLY • link 23 months ago by Trivas ★ 1.9k