Is It Necessary To Mask Known Repeats With Repeatmasker Before Finding Common Motifs In Sequences?
3
9
Entering edit mode
13.6 years ago
Gahoo ▴ 270

Hi,

What I want to know is whether repeats will affect the results of motif discovery? Programs like MEME and Weeder will be affected or not? Masking the repeats will solve the problem or not if it is affected?

repeatmasker motif • 7.3k views
ADD COMMENT
6
Entering edit mode
13.6 years ago
Farhat ★ 2.9k

Repeats will affect de novo motif finding algorithms. Oftentimes, they will present the strongest signal and thus overwhelm the signal from other motifs. One way to mitigate this is to choose a proper background set so that you only find the repeat if it is overrepresented compared to the background. Another is to find increase the number of motifs you search for, thus, your top motifs may come from repeats but you will see other motifs lower than them.

ADD COMMENT
1
Entering edit mode

I absolutely agree here - I also use the genome specific motif counts when running Weeder!

ADD REPLY
1
Entering edit mode

There aren't any standards (as far as I know) but ideally it should be from the same organism and similarly filtered. E.g., if you are looking for motifs in promoters than your background would be random promoters from the same organism.

ADD REPLY
0
Entering edit mode

Thanks for your answer. But how do I know which is the proper backgroud set? Is there any standard to choose?

ADD REPLY
0
Entering edit mode

For future reference, HOMER find motifs commands use this philosophy automatically.

ADD REPLY
2
Entering edit mode
13.6 years ago

This really comes down to the type of motif you are expecting to find. I guess you can speed up motif discovery with repeatmasker (Assuming you are working with the human genome or some other higher eukaryote). However, if your motif is residing in repeats you will loose information. You can test with and without repeatmasker. You might start with a small chromosome.

ADD COMMENT
0
Entering edit mode

In fact, I'm working with Rice genome. Nice advice. I will give it a try.

ADD REPLY
0
Entering edit mode

As far as I can remember RepeatMasker was created for the human genome - and was undocumented and unpublished. LINES and SINES for rodents were added and RepeatMasker if pretty safe to use with higher eukaryotes. I don't know about plants.

ADD REPLY
2
Entering edit mode
13.6 years ago
Ian 6.1k

I routinely use Weeder and analyse the 200bp centred on the summit of my MACS ChIP-seq binding regions. At these sequence lengths i do not mask for repeats, primarily because ChIP-seq can report regions in repeat regions that can contain functional motifs. Therefore, masking would potentially hide useful information.

When i used to analyse ChIP-chip data i looked at larger regions and did not expect regions covering repeats. So then i did mask out repeats.

Have you tired GimmeMotifs yet :) Uses a consensus of whatever motif discovery tool you like. Still for some jobs i still stick with Weeder.

ADD COMMENT
0
Entering edit mode

I'll give it a try. :)

ADD REPLY
0
Entering edit mode

I'll give it a try. Thanks for your suggestion.

ADD REPLY

Login before adding your answer.

Traffic: 1972 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6