How can we identify if there are neighboring genes or overlapping promoter regions within the 1000 bp upstream of the transcription start site (TSS) that we have defined? Is it an issue that there might be other promoter regions when we aim to discover novel motifs involved in myogenesis.
We aim to identify transcription factor binding sites (TFBSs) in the promoter regions of genes across 40 hierarchical clusters, defining promoters as the area 1000 bp upstream of the transcription start site (TSS). Should we limit our analysis to genes where the nearest upstream neighboring gene on the opposite strand is more than 1000 bp away to avoid potential confusion with TFBSs from neighboring genes unrelated to myogenesis or will it not be an issue. And if we do have to limit how can we find out from a large set of promoter sequence data for more than 1500 genes.