Question

Is It Necessary To Mask Known Repeats With Repeatmasker Before Finding Common Motifs In Sequences?

9

Entering edit mode

13.5 years ago

Gahoo ▴ 270

Hi,

What I want to know is whether repeats will affect the results of motif discovery? Programs like MEME and Weeder will be affected or not? Masking the repeats will solve the problem or not if it is affected?

repeatmasker motif • 7.2k views

ADD COMMENT • link updated 13.5 years ago by Farhat ★ 2.9k • written 13.5 years ago by Gahoo ▴ 270

score 6 · Answer 1 · 2011-06-06

6

Entering edit mode

13.5 years ago

Farhat ★ 2.9k

Repeats will affect de novo motif finding algorithms. Oftentimes, they will present the strongest signal and thus overwhelm the signal from other motifs. One way to mitigate this is to choose a proper background set so that you only find the repeat if it is overrepresented compared to the background. Another is to find increase the number of motifs you search for, thus, your top motifs may come from repeats but you will see other motifs lower than them.

ADD COMMENT • link 13.5 years ago by Farhat ★ 2.9k

1

Entering edit mode

I absolutely agree here - I also use the genome specific motif counts when running Weeder!

ADD REPLY • link 13.5 years ago by Ian 6.1k

1

Entering edit mode

There aren't any standards (as far as I know) but ideally it should be from the same organism and similarly filtered. E.g., if you are looking for motifs in promoters than your background would be random promoters from the same organism.

ADD REPLY • link 13.5 years ago by Farhat ★ 2.9k

0

Entering edit mode

Thanks for your answer. But how do I know which is the proper backgroud set? Is there any standard to choose?

ADD REPLY • link 13.5 years ago by Gahoo ▴ 270

0

Entering edit mode

For future reference, HOMER find motifs commands use this philosophy automatically.

ADD REPLY • link 7.8 years ago by boczniak767 ▴ 870

score 2 · Answer 2 · 2011-06-06

2

Entering edit mode

13.5 years ago

Martin A Hansen 3.0k

This really comes down to the type of motif you are expecting to find. I guess you can speed up motif discovery with repeatmasker (Assuming you are working with the human genome or some other higher eukaryote). However, if your motif is residing in repeats you will loose information. You can test with and without repeatmasker. You might start with a small chromosome.

ADD COMMENT • link 13.5 years ago by Martin A Hansen 3.0k

0

Entering edit mode

In fact, I'm working with Rice genome. Nice advice. I will give it a try.

ADD REPLY • link 13.5 years ago by Gahoo ▴ 270

0

Entering edit mode

As far as I can remember RepeatMasker was created for the human genome - and was undocumented and unpublished. LINES and SINES for rodents were added and RepeatMasker if pretty safe to use with higher eukaryotes. I don't know about plants.

ADD REPLY • link 13.5 years ago by Martin A Hansen 3.0k

score 2 · Answer 3 · 2011-06-06

2

Entering edit mode

13.5 years ago

Ian 6.1k

I routinely use Weeder and analyse the 200bp centred on the summit of my MACS ChIP-seq binding regions. At these sequence lengths i do not mask for repeats, primarily because ChIP-seq can report regions in repeat regions that can contain functional motifs. Therefore, masking would potentially hide useful information.

When i used to analyse ChIP-chip data i looked at larger regions and did not expect regions covering repeats. So then i did mask out repeats.

Have you tired GimmeMotifs yet :) Uses a consensus of whatever motif discovery tool you like. Still for some jobs i still stick with Weeder.

ADD COMMENT • link 13.5 years ago by Ian 6.1k

0

Entering edit mode

I'll give it a try. :)

ADD REPLY • link 13.5 years ago by Gahoo ▴ 270

0

Entering edit mode

I'll give it a try. Thanks for your suggestion.

ADD REPLY • link 13.5 years ago by Gahoo ▴ 270