Question

Discovering motifs for core promoter elements on a set of enhancer sites

0

Entering edit mode

9.2 years ago

dally ▴ 210

I have a bed file of enhancer sites that I'd like to run motif analysis on. I'm looking for core promoter elements (if any exist) for regions such as TATA-box, Sp1, Inf, etc.

I came across MEME, and while I admittedly haven't read the entirety of the manual (I'm working on it though!) I thought it would be a good idea to come here and ask for any common pitfalls for this type of analysis.

Specifically, I'm looking for advice to make this analysis statistically and biologically sound. Are the input files to MEME suite my bed file of enhancer sites, or should I first convert this bed file to fasta? Which of the MEME suite tools should I be using if my enhancer sites vary from no less than 20bp to no larger than 1000bp? What is the difference between MEME's novel, ungapped motif identifier and GLAM2's noval, gapped motif identifier? Which one would be better suited to this type of analysis?

Thank you!

meme chip-seq motif-analysis • 3.7k views

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by dally ▴ 210

0

Entering edit mode

PWMs for canonical core promoter elements have already been published. For example, Ohler 2002 Genome Biology has several of those. In addition, Vo Ngoc 2017 Genes Development recently refined the Inr element.

Unless you are looking for novel core promoter elements, I recommend you just used these prior annotations, and set the P- or E-value cutoff on your own. Also, in Ohler 2002 Genome Biology all of the elements had like 12nt long. You definitely need to cut it to 4 to 8 to keep just the positions with most information in your analysis.

Good luck.

ADD REPLY • link 8.1 years ago by maduh17 ▴ 10

Ram · Answer 1 · 2016-02-11

2

Entering edit mode

9.2 years ago

Fidel ★ 2.0k

You can use centrimo from the MEME suite. This is the description

CentriMo identifies known or user-provided motifs that show a significant preference for particular locations in your sequences (sample output from sequences and motifs). CentriMo can also show if the local enrichment is significant relative to control sequences. See this Manual for more information.

As input you need a fasta file containing the sequences you are interested in. You can convert your .bed file to a fasta file using fastaFromBed from bedTools.

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by Fidel ★ 2.0k

0

Entering edit mode

Hi Fidel. In sequences not larger than 1 Kb, personally I would not particularly focus on motif position.

ADD REPLY • link 9.2 years ago by Anima Mundi ★ 2.9k

0

Entering edit mode

If anything this gives me excellent information to go on. I will probably use centrimo and then compare it to the results from AME/FIMO, which does not focus on motif position.

ADD REPLY • link 9.2 years ago by dally ▴ 210

Ram · Answer 2 · 2016-02-11

1

Entering edit mode

9.2 years ago

Anima Mundi ★ 2.9k

Hello,

Question 1: yes, you should convert you input file to FASTA

Question 2: while you can certainly play around with the tools of the suite, I would first use MEME in your shoes

Question 3: Glam2's approach is somehow more ambitious than MEME's, as it tries to identify complex motives (it tries to identify de novo "meta-motives" made by units that might be separated)

Question 4: as above, I would prioritize MEME. Also, you could find existing PSWMs of motives of interest (e.g. Sp1), convert them in MEME format and input them, together with your FASTA sequences, to FIMO.

ADD COMMENT • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by Anima Mundi ★ 2.9k

0

Entering edit mode

I'll give FIMO a try. I seem to have been having a hard time finding PWM's for my targets of interest (finding only TBP and Sp1) in homo sapiens. I find it hard to believe there is not already motifs for the other guys, but JASPAR seems to not have them, and my search is bringing up nothing.

Will also give meme a try, been running into some problems of "Your sequence must be at least 8 characters long, remove shorter sequences and re-run", but I'll get it. Thanks for the help!

ADD REPLY • link updated 2.8 years ago by Ram 45k • written 9.2 years ago by dally ▴ 210

0

Entering edit mode

Regarding PSWM databases, check out also TRANSFAC (unfortunately mostly not free) and UniPROBE.

ADD REPLY • link 9.2 years ago by Anima Mundi ★ 2.9k

0

Entering edit mode

It's amazing that none of these have the actual promoter element motifs i'm looking for (at least not the free version of TRANSFAC). I have found a few drosophila motifs, but working with human data I don't really know how reliable this would be.

ADD REPLY • link 9.2 years ago by dally ▴ 210

0

Entering edit mode

Yes, it might be due to differences between your ideal motives and the ones of Drosophila. Also, keep in mind that FIMO is mostly concerned with avoiding type I errors. There are more permissive algorithms, if you are already sufficiently convinced that your motives have to be there (see MotifViz Possum, for example).

ADD REPLY • link 9.2 years ago by Anima Mundi ★ 2.9k

0

Entering edit mode

Would you know a database that contains mammalian promoter sequences that have an annotated TATA box? Some of the papers I read mentioned obtaining a list from GenBank, but i'm not having much luck. I want to validate some of the motifs I have found.

ADD REPLY • link 9.2 years ago by dally ▴ 210

0

Entering edit mode

I do not know of such a database, but you may find what you search by defining TSS positions provided by RNASeq studies et similia, and then expanding their coordinates to yield a list of narrow ranges (say, 50 bp or less) that are very likely to contain TATA boxes if you chose your promoter type carefully (e.g., there are TATA box-less promoters). You can then use that list to get a FASTA file to screen via MEME; this would output a PSWM that you might then use on your promoters. Actually, your list of promoters might have already been obtained by expanding TSS positions, so if you have TSS information you can just trim your promoters in a different way. In general, I suggest you to take care about the quality of the sequences you provide to motif-searching algorithms, as signal to noise ratio is very important.

ADD REPLY • link 9.2 years ago by Anima Mundi ★ 2.9k