Why does peak analysis or motif analysis most often use a whole genome background, when they do not have any control to compare?
When I run 20k peaks for motif analysis. I picked 5000 target sequences and 40k background sequences. Why are the numbers different? Does it affect p-values (% of target sequences that have motif X versus % of background sequences that have motif X)?
Thanx Larry. So picking 5000 targets and 40k background sequences is normal ? I used homer for this analysis.