Dear all,
I would like to match (intersect) a quite big set of chip-seq peak bed files with a single bed file containing transcriptional start sites (TSS) identified by cage (cap analysis of gene expression). By using a large cohort of transcription factors (TFs), I am hoping to generate a map of the TFs associated with my promoters.
Technically, I have my promoter bed file and currently about 80 chip-seq peak bed files. I would like to look at TF binding in close upstream proximity of my promoters (for instance within 500 bp upstream of the TSS). I read that bedtools could be used for this, but I am limited to a windows laptop and I am unsure if this would work. However, I know a little but of R and would therefore prefer to use any R package that could the job.
I would be very grateful for any hints how I could start the analysis.
Thank you very much!
Tobias
Thank you very much for your reply.
I have two questions before I will actually try your suggestion: 1. If I understand correctly, in step 2 all chip-seq peak files are merged into one big file (unioned.peaks.bed). If I find overlap of peaks with promoters in step 3 , how will I know from which TF peak file they come from? Will the output file state it? 2. How can I set a window (lets say 500 bp ) upstream of the promoter to look for TF / promoter overlap? As my promoter file is actually a TSS file, the width of the genomic locations listed is quite narrow (1-100 bp roughly).
Again, thank you very much for your help!
Pre-process your peaks files and add their names to the ID column.
Assuming your peak files are named sensibly, you could create a loop to rename them:
Then use
bedmap
as described, using thelabeled
peak files instead of the original peak files.If you want to set a 500 bp upstream window, add
--range 500
to the givenbedmap
command. Overlaps will be reported 500 bases up- and downstream of promoter edges.Great! Thank you very much for your help!
Best,
Tobias