Hello, I am very new to sequence data analysis and had some structural questions. I am trying to analyze the difference in the presence of an enriched mark between a control group and a experimental group. The animals model we use are mice. I have 3 biological replicates from each of the two groups and I am trying to find the different in the presence of this mark. I'm good up to the point of alignment but I'm stuck on how to peak call. I know people usually use an control/input sample for chip-seq where they don't do the IP and just sequence to account for the background noise but we didn't do a non IP control. Here's are my thoughts on how to approach this and the options I came across. Please let me know which one is the most reasonable approach, it would be wonderful if there are references to papers.
option 1: Use the 3 control mice and randomly assign them to the 3 experimental treatment mice and peak call using Control mice as input. I would end up with 3 files of different peaks, then I would find the peaks present in more than 50% of the replicates
option 2: Use the 3 control mice and match them with all the possibilities of the 3 experimental mice. I would end up with 9 files and then find the peaks present in more than 50% of the replicates
option 3: peak call all the samples individually without an input (MACS) allows this. and then find the peaks that are in common in more than 50% of the animals in each the control and the experimental groups. I would end up with two files of peaks, one for control and one for experimental treatment group. Then find the difference between the two files.
Thank you, I realize this is a very long post. Thank you.
Are these broad peaks or narrow peaks (like histone modifications or transcription factors)?
They are 5hmC enriched regions with an average length of 1.5 kb.