So I have some data from this paper. For example, one of the data files is this peak file for histone modification H3K4me2. Its a .BED file. I'm supposed to take this data and process it using methods of another paper I am reading. I've read the papers and understand what I am supposed to do but since this is my first time working with actual data, I am a bit lost about it.
So if I understand this correctly, the peak files (for example like above) gives me reads of DNA sequences that are bound to that specific histone modification. So in theory I am looking at a lot of peaks when referenced against the genome (which is mm9). I have the reference files in .fa format. Different chromosomes are in separate files.
So what software can I use to deal with this data? What does the .BED format tell me? I am looking to work with in R or MATLAB. Basically, how do I get started?
Secondary question, in the GEO link, at the bottom they have two additional "replicate" text files. What is the important of those files?
Edit: Additionally, if anyone has any papers to a "introductory workflow" for this type of data/processing, please let me know.
Edit (based on alex and devon's replies)
Thanks, so I put it up on the genome browser but I was expecting to see peaks. I don't really see peaks. I think its because the data in the file looks like: chr1 4847005 4847705
. Does this mean that the authors already looked at the peaks using raw data and then concluded "between 4847005 and 4847705 all the nucleosomes have this particular histone modification". And that's what I am looking at in the genome browser? Here is a screenshot of the custom track on the genome browser:
I'm actually doing similar analysis. See the review references I posted on my question: Fastq file to ALN file
The paper that the data comes from talks about how it is a higher quality data than previous studies. It's got more sequencing depth. That being said, I am going to download raw data and trying to replicate their peak files for my own sake and learning.