Hi guys,
I have an ATAC-seq dataset of three conditions, each with three replicas. After running MACs2 within a custom pipeline, peak files (narrow, summit and size filtered) for individual replicas and merged ones were obtained. Now, I would like to get peaks info (.bed and .fasta files) associated with specific genes. The aim is to use those peaks to do motif discovery and enrichment and correlate transcription factors with genes of interest. For that, I will use the MEME suite, in particular CentriMo, and will try Homer also. My question is: how to get the .bed and .fasta file of accessible peaks associated with such genes?
Any advice on downstream motif analysis would be greatly appreciated.
Thanks a lot in advance!
Thanks a lot! On more question: should I only consider the exact coordinates of the genes, or maybe add some flanking regions (e.g., 5kb upstream and 1kb downstream) to potentiate TF motif discovery and enrichment? I don't know if that would make sense, though!
for homer: you just provide the coordinates of peaks. for MEME, you should extract fasta sequences with customer script or bedtools.
for proximal targets: actually, you should use H3K27ac and H3K4me3, or other histone modification markers ChIP-seq data to define potential promoter and enhancers, or other cis-regulatory element regions.
for distal targets: maybe you should take Hi-C (promoter-enhancer interactions) into account.
Depends what you mean with
gene
. If you mean the promoter I would take ATAC peaks that overlap like -500bp to +50 relative to the TSS of annotated genes.