Hello there,
I was using findMotifsGenome.pl to do motif enrichment analysis based on specific genome regions. Here is my script:
findMotifsGenome.pl homer_peaks_4_parts.txt hg38.fa MotifOutput_given -size given -mask
Because it is DNA methylation RRBS data, there are no peak information. In order to match motifs to the entire region, I use the parameter: -size given
. It runs well. But the result is not good.
My question is: can I choose specific motif family and then only focus on small cluster of motifs?
And the parameters like this:
findMotifsGenome.pl homer_peaks_4_parts.txt hg38.fa MotifOutput_given -size given -mask
But the motif format is quite different to the motif format of JASPAR. I can only select or extract specific motifs from homer folder. In this case, the necessary motifs are not enough. So how to get motifs from JASPAR with matched format? Or convert the format?
Thanks in advance.
anyone here ?
How many basepairs are these regions on average? If it is a lot (for example over 1000bp) then results are expected to be wonky, since with broad regions you would expect many matches by chance. What exactly are the regions? A specific set of methylated regions? Or just all CpG islands/floats?
No, each region contains about 500 basepairs that just extracted from one gene.
Yes, it is methylated regions.
I used MeMe suite AME enrichment analysis and got several results. But for homer, there is none.
But I think my script is right.
I don't think this makes a lot of sense. If you scan gene body regions against the random genome then you will observe general gene body motifs. Likewise, if you scan promoters versus the genome you get general promoter motifs. I always do like-for-like motif analysis. If I scan for example differential regions from ATAC-seq then I compare with confidently non-differential regions. If I scan differential promoters then I scan against other promoters. Here, depending on the context, maybe scan these regions against either unmethylated gene bodies, or against methylated regions from other genes. I don't know the context, but this against genome sounds not like-for-like. Also, these tools are built (afaik) with many regions in mind, so how many regions are in your test set?