I want to use findMotifsGenome.pl program in Homer to identify enriched motif in the aggregated regions from single-cell ATAC-seq data.
I wonder what the -size parameter means? In their document, I found two explanations: 1. length of sequences used 2. -size <#> (fragment size to use for motif finding, default=200); -size <#,#> (i.e. -size -100,50 will get sequences from -100 to +50 relative from center); -size given (uses the exact regions you give it)
Does it mean to 1)specify the sub-size within the given peaks, or 2)the DNA size in the prepared library fragment? I am more lean to (1). Then if I give (-100, 50), does it will only search for motif within this range in each of the peak given in the bed file? Note that the bed file I input has average peak size of ~1500. I am not sure whether 150 window is too small.
Also, does anyone knows the publication link to Homer? Could you post a link here if there is any paper I can read?
Hello, I am trying to understand how Homer chooses the center of the peak when I don't specify size, do you know if '-size given' is the default? I want to get positions for the motifs found in the peaks but the output does not look like the center of the peak is litterally the middle point of the peak.
My command is something like this:
the output bed file (con_brown_diffbind_close.bed_motifs.bed) has start and end positions but the file con_brown_diffbind_close.tsv has positions relative to the center of the peak (something like +477, -50...) but the positions of both files only match if the center of the peak is not the middle point.