Hi!
I need to get enhancer regions from publicly available data e.g ENCODE. Encode contain Histone marks bigwig file which is Signal p value and bed (broadPeak),(NarrowPeaks) which contains peaks. My Question is which file should I use bigwig or (broadPeak),(NarrowPeaks) from ENCODE to define the enhancer regions? Also which histone marks should I choose to define enhancer regions? Then which protocol I need to follow in order to define enhancer regions from that histone marks files ?
After getting enhancer region file I am going to use IGV tool to visualize that whether snps overlap with enhancer regions
Thank you
There is unfortunately no bulletproof protocol for this. Keep in mind that histone modifications are enriched at enhancer regions, but are not specific. Hence, a region with typical enhancer marks such as H3K4me1 and H3K27ac can be an enhancer, but there is no guarantee it it. Depending on your purpose you might simply go for peaks of H3K27ac in the celltype that is relevant for your work. You could use narrowPeak here.
Thanks for replying. I am completely new in this field.
In order to get enhancer regions when I use the narrowPeak H3K4me1 and H3K27ac files from ENCODE, do I need to exclude the peak regions from these file overlapping promoters?
If yes, then how can I exclude promoter regions?
Thank you
Often enhancers are considered as "distal" regions in the literature, and people exclude peaks within something like 1kb of annotated canonical TSS. GTF files, e.g. from GENCODE have TSS annotations that can be used for filtering.
Another resource I know is http://www.enhanceratlas.org/downloadv2.php which at least covers CD4+ cells, so you might want to go with that. I do not say (actually I don't know) that this is necessarily better than other approaches/databases, but at least it's reproducible and to some extend celltype-specific since you're pulling data from a curated repository that are already processed.
The problem with these sorts of databases is that it is usually a relatively wild collection of different NGS types that were used to call these regions/enhancers. For example FANTOM extensively used CAGE-seq (a type of RNA-seq where actively-transcribed regions are assayed), while ENCODE osed ChIP-seq a lot and then there were waves of papers using open chromatin by DNase-seq or ATAC-seq. EnhancerAtlas uses combinations of data afaik. The overlap is probably, as always with these things, limited and each has its own strengths and weaknesses.
Given the tremendous celltype-specificity of regulatory elements I would prioritize to really have a dataset as close to your model system as possible. Even with a good list of "enhancers" (inflationary term in the literature for any distal region) you still do not know which gene(s) they regulate and the assignment will create many false connections. The entire gene-enhancer connection issue is imo one of the most unsolved problems in modern high-throughput biology.
ATpoint thanks for replying . Since on EnhancerAtlas, http://www.enhanceratlas.org/downloadv2.php The database provides enhancer annotation in human is hg19 but i need GRCh38/hg38 .so can i convert EnhancerAtlas enhancer hg19 coordinates to hg38 using UCSC Genome Browser LiftOver?
Yes, liftOver is a reasonable approach I guess.