I'd like to preface this post by saying that I recently began working in a new lab about two weeks ago, and have spent a large majority of my time learning to analyze ChIP-Seq data provided by other lab members. I've managed to learn quite a lot on my own (using command line interface, how to use a variety of tools such as bedtools, homer, MACS2) and at least in theory understand how peak calling, annotation, and motif discovery work.
My main problem at the moment is I have no idea where to exactly start so that I can generate the type of data I am interested in.
Basically I have been given 8 files containing ChIP Seq data for various proteins / histone marks (Pol II, KAP1, H3K27Ac, etc.) ... These files have been aligned, and have had their peaks called using MACS2 so I have a generated Protein_MACS2.summits.bed
file for each protein.
I am interested in finding the overlapping regions of H3K27Ac, H3K4me1, and Pol II in Exons only (and eventually introns). I then want to generate a metagene plot and/or heatmap of this data that is centered on the beginning or the middle of exons.
For instance, I have seen many metagene plots and heatmaps of Pol II constructed but all of them seem to be centered on the TSS and I can't for the life of me figure out how to center this information on exon starts, or the middle of exons.
My guess is that I must first find some sort of reference data in order to determine where the peaks in my ChIP-Seq data are found in exons, but after searching UCSC I can't really find what I'm looking for ... and I'm aware there is a Exon database of hg19 in the Table Browser ... but what exactly do I do with this data after I have downloaded it in BED format?
As you can probably tell, I've spent the better part of 3 days attempting to find any sort of solution or tool or really just anything that could help me out but I'm at a loss.
I'd appreciate very detailed answers with tools and step by step guides on what to do. However I'm aware you are all very busy so a simple guideline of "use this tool to get this, then use this the output in this tool to generate this" would be much appreciated. I will figure out how the tools work and what commands must be used.
I have a bigWig file that was generated by our previous bioinformaticist. To get the BED / BED12 file would I simply use the Table Browser in UCSC and download a bed file of Exons Only for whatever genome I will be using? (In this case hg19)
Yup, for a BED12 you would use something like this, which you can just get via FTP from UCSC (refGene is a common source of things like this).
Edit: I should add that I've never personally used anything other than a BED12 file, so I don't know how many columns you actually need (you might need at least 6).
Hey Devon I have a follow up question to this. Say I have a region file of exons, i'd like to plot from the center. Would I use scaled-regions or reference-point method of deepTools? And if I use reference point, would I simply type "center" as the --referencePoint?
Scale-regions, since that would make the most since for looking at the coverage over a variable-width feature.