I use Find Individual Motif Occurrences (FIMO) from the MEME suite for this kind of analysis. It accepts a fasta file with sequences, e.g. use bedtools getfasta
to convert your peaks to fasta format, and a position frequency matrix for the TF of interest, e.g. download from JASPAR or HOCOMOCO in MEME format. It then scans the sequences for significant similarity with the provided motif and returns the regions that match it:
In this example, lets check a stretch of DNA around the first exon of the human BCL6 gene for motif occurrences against all motifs listed in the JASPAR vertebrate core collection. In your case you should provide a fasta with all the sequences you are interested in.
Coordinates of the query sequence (hg38) chr3:187744307-187746589
## Get JASPAR motifs (vertebrate non-redundant core collection) in meme format:
wget http://jaspar.genereg.net/download/CORE/JASPAR2018_CORE_vertebrates_non-redundant_pfms_meme.zip
## Unzip:
unzip JASPAR2018_CORE_vertebrates_non-redundant_pfms_meme.zip
## Install fimo (part of MEME):
conda install -c bioconda meme
## if fimo complains about libiconv libraries, also install that manually:
conda install -c conda-forge libiconv
## run fimo, providing the .meme file matching your TF:
fimo --parse-genomic-coord yourTF.meme input.fa
The input.fa
here looks like:
>chr3:187744307-187746589
(sequence...)
When specifying the genomic coordinates of the sequence in the fasta header in the form chr-start:end
(1-based coordinates) and using the --parse-genomic-coord
option of fimo, the resulting GFF file will show the exact coordinates of the motif in the genome.
Check output in gff format which contains significant matches:
head fimo_out/fimo.gff
##gff-version 3
chr3 fimo nucleotide_motif 187745593 187745603 43.9 - . Name=MA0002.2_chr3-;Alias=RUNX1;ID=MA0002.2-RUNX1-1-chr3;pvalue=4.11e-05;qvalue= 0.177;sequence=TCTTGTGGCTT;
chr3 fimo nucleotide_motif 187746233 187746243 40.4 + . Name=MA0002.2_chr3+;Alias=RUNX1;ID=MA0002.2-RUNX1-2-chr3;pvalue=9.11e-05;qvalue= 0.196;sequence=GTTTGTGGTGT;
chr3 fimo nucleotide_motif 187744975 187744985 41.1 + . Name=MA0003.3_chr3+;Alias=TFAP2A;ID=MA0003.3-TFAP2A-1-chr3;pvalue=7.81e-05;qvalue= 0.323;sequence=CCCCCCAAGCA;
chr3 fimo nucleotide_motif 187745763 187745774 41.9 + . Name=MA0018.3_chr3+;Alias=CREB1;ID=MA0018.3-CREB1-1-chr3;pvalue=6.41e-05;qvalue= 0.146;sequence=TGTGACGTCGGC;
chr3 fimo nucleotide_motif 187745763 187745774 41.9 - . Name=MA0018.3_chr3-;Alias=CREB1;ID=MA0018.3-CREB1-2-chr3;pvalue=6.41e-05;qvalue= 0.146;sequence=GCCGACGTCACA;
chr3 fimo nucleotide_motif 187746240 187746250 50.7 - . Name=MA0025.1_chr3-;Alias=NFIL3;ID=MA0025.1-NFIL3-1-chr3;pvalue=8.51e-06;qvalue= 0.0387;sequence=TTACGTAACAC;
chr3 fimo nucleotide_motif 187746378 187746388 40.5 + . Name=MA0025.1_chr3+;Alias=NFIL3;ID=MA0025.1-NFIL3-2-chr3;pvalue=8.97e-05;qvalue= 0.204;sequence=ATATGTAACAA;
chr3 fimo nucleotide_motif 187745661 187745670 40.4 - . Name=MA0028.2_chr3-;Alias=ELK1;ID=MA0028.2-ELK1-1-chr3;pvalue=9.09e-05;qvalue= 0.412;sequence=ACCGGAACCT;
chr3 fimo nucleotide_motif 187745215 187745225 47.4 + . Name=MA0032.2_chr3+;Alias=FOXC1;ID=MA0032.2-FOXC1-1-chr3;pvalue=1.81e-05;qvalue= 0.0779;sequence=TAAATAAATAT;
Hi , have you figured it out? I am considering to do the same thing using HOMER.
Something wrong with the below answer?
No, thanks for the suggestions about FIMO.
I am using HOMER for the motif analysis and I got good results so I want to get the locations of the enriched motifs.
The peak locations could be found using HOMER as they described in the guideline: http://homer.ucsd.edu/homer/ngs/peakMotifs.html
Finding Instance of Specific Motifs
By default, HOMER does not return the locations of each motif found in the motif discovery process. To recover the motif locations, you must first select the motifs you're interested in by getting the "motif file" output by HOMER. You can combine multiple motifs in single file if you like to form a "motif library". To identify motif locations, you have two options:
For example: findMotifsGenome.pl ERalpha.peaks hg18 MotifOutputDirectory/ -find motif1.motif > outputfile.txt
For example: annotatePeaks.pl ERalpha.peaks hg18 -m motif1.motif > outputfile.txt
Cool, did not know Homer has an option to return specific motifs. Thanks, learned something new :)