Hi everybody,
As a part of some recent analyses using the Illumina 450k DNA Methylation microarrays, I have been running the MEME suite to find significant motifs in some Differentially Methylated Probes (DMP) subsets. Problem is, I have found strange results in the shape of the same motifs coming out again and again.
Using the FDb.InfiniumMethylation.hg19 and the BSgenome.Hsapiens.UCSC.hg19 R/Bioconductor packages, I generate DNA sequences of 200bp length centered on the probes being processed and save them as FASTA files. Afterwards, I feed them to MEME and wait for the results.
Some motifs were appearing for every subset we were testing. Specifically, the most common motifs were repetitive sequences of the same nucleotide (polyA, polyC, polyG, polyT). This raised some suspicions, so we decided to try the motif finding procedure on two subsets containing 300 and 1000 random 450k probes. Problem is, the same motifs appeared again.
So, it seems that those motifs are somehow present around the 450k probes. Is this a probe design consequence? I am also wondering if the MEME parameters could be behind these results. I am currently running with the following options:
meme {input.fasta} -dna -nmotifs 10 -evt 0.01 -maxw 50 -maxsize 10000000
Just wondering if the prior distribution of nucleotides in the vicinity of 450k probes does not meet the statistical assumptions of the MEME algorithm.
Has anybody here experienced a similar problem? Any help or hint would be much, much appreciated.
EDIT: I am including a capture of MEME's output to show how the motifs look like: