Question

Eliminating repetetive motifs found by MEME

0

Entering edit mode

7.2 years ago

rbronste ▴ 420

Whenever I run MEME I get several top E-value hits that are pure repeats (often 25bp or longer) and I would like to eliminate these from the search and keep only actually possible TF motifs. Is there a way to do this? Thanks.

MEME motif ChIP-Seq • 1.6k views

ADD COMMENT • link updated 7.2 years ago by Alex Reynolds 36k • written 7.2 years ago by rbronste ▴ 420

score 2 · Answer 1 · 2017-09-14

2

Entering edit mode

7.2 years ago

Alex Reynolds 36k

Unless you're looking for de novo motif models, one option is to use TOMTOM to rank MEME hits by nearness to published TF databases. That should clean things up considerably.

Unless you're looking for really long binding sites (dimers, say) you could also set the -maxw parameter in MEME so that you're looking for sites that are less than 25nt long. Tuning other parameters may be of use.

Another option is to adjust your background, by removing sequences from repeat-masked regions.

ADD COMMENT • link 7.2 years ago by Alex Reynolds 36k

0

Entering edit mode

Great suggestions. Do you know of a good source to obtain a mouse (mm10) bed file of repeat-masked regions? UCSC I guess?

ADD REPLY • link 7.2 years ago by rbronste ▴ 420

0

Entering edit mode

UCSC would be my first stop. Others might suggest Biomart, maybe.

$ wget -qO- http://hgdownload.cse.ucsc.edu/goldenPath/mm10/database/rmsk.txt.gz | gunzip -c | awk -v OFS="\t" '{ print $6,$7,$8,$11,$2,$10; }' | sort-bed - > rmsk.bed
$ bedops --merge rmsk.bed > rmsk.mergedRegions.bed
$ bedops --difference myRegions.bed rmsk.mergedRegions.bed > myRegions.masked.bed
$ bed2faidx.pl --options... < myRegions.masked.bed > myRegions.masked.fa

Etc.

ADD REPLY • link 7.2 years ago by Alex Reynolds 36k