Question

Analyzing genome wide sequencing data for control and experimental groups with biological replicates.

0

Entering edit mode

10.2 years ago

sisiliuiuc • 0

Hello, I am very new to sequence data analysis and had some structural questions. I am trying to analyze the difference in the presence of an enriched mark between a control group and a experimental group. The animals model we use are mice. I have 3 biological replicates from each of the two groups and I am trying to find the different in the presence of this mark. I'm good up to the point of alignment but I'm stuck on how to peak call. I know people usually use an control/input sample for chip-seq where they don't do the IP and just sequence to account for the background noise but we didn't do a non IP control. Here's are my thoughts on how to approach this and the options I came across. Please let me know which one is the most reasonable approach, it would be wonderful if there are references to papers.

option 1: Use the 3 control mice and randomly assign them to the 3 experimental treatment mice and peak call using Control mice as input. I would end up with 3 files of different peaks, then I would find the peaks present in more than 50% of the replicates

option 2: Use the 3 control mice and match them with all the possibilities of the 3 experimental mice. I would end up with 9 files and then find the peaks present in more than 50% of the replicates

option 3: peak call all the samples individually without an input (MACS) allows this. and then find the peaks that are in common in more than 50% of the animals in each the control and the experimental groups. I would end up with two files of peaks, one for control and one for experimental treatment group. Then find the difference between the two files.

Thank you, I realize this is a very long post. Thank you.

sequencing genome • 4.0k views

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by sisiliuiuc • 0

0

Entering edit mode

Are these broad peaks or narrow peaks (like histone modifications or transcription factors)?

ADD REPLY • link 10.2 years ago by matted 7.8k

0

Entering edit mode

They are 5hmC enriched regions with an average length of 1.5 kb.

ADD REPLY • link 10.1 years ago by sisiliuiuc • 0

Ram · Answer 1 · 2014-10-02

0

Entering edit mode

10.2 years ago

Istvan Albert 102k

I am not sure I got every detail right but it seems that since your are looking at the differences between samples and you already have a control and treatment you can use those. You would not need yet another control.

ADD COMMENT • link updated 2.9 years ago by Ram 44k • written 10.2 years ago by Istvan Albert 102k

0

Entering edit mode

Usually there is a non IP sample used as Input to differentiate between the signal and noise. I'm using MACS to call peaks on galaxy tool, but it will let me call peaks without an input. However since I have biological controls, It might be fine to use the biological control as input and the biological treatment as treatment in MACS. My problem is since I have 3 biological replicates for each control and treatment group, I'm not sure if I should concatenate the replicates together and then peak call or peak call first and then find the common difference in all 3 pairs.

ADD REPLY • link updated 2.9 years ago by Ram 44k • written 10.1 years ago by sisiliuiuc • 0