Question

Peak length for TF motif discovery

0

Entering edit mode

4.1 years ago

Pappu ★ 2.1k

I noticed several possibilities of using the whole peak fasta sequence, 100 or 250bp up and downstream of peak midpoint for motif discovery in MEME or HOMER. I am wondering what is the standard protocol for this. The same goes for selecting background sequences - should it come from shuffling the peaks or from random positions in the genome.

ChIP-Seq • 714 views

ADD COMMENT • link updated 4.1 years ago by khorms ▴ 230 • written 4.1 years ago by Pappu ★ 2.1k

score 2 · Answer 1 · 2020-11-19

2

Entering edit mode

4.1 years ago

khorms ▴ 230

I don't think there is a standard protocol for this, try different lengths. Regarding the background: if you do shuffling, make sure to keep the dinucleotide content (it's a very important predictor for many biological features). If you choose random positions from the genome, pick the ones where (1) there is no peak, (2) dinucleotide content is close to dinucleotide content of the peaks you have (for example, for each peak find one non-peak with similar dinucleotide content)

ADD COMMENT • link 4.1 years ago by khorms ▴ 230

0

Entering edit mode

...or simply run Homer with default settings to do exactly that :) Default length is 200bp in this.

ADD REPLY • link 4.1 years ago by ATpoint 86k