Hi.
I have a Chip-seq dataset described as follows. Two replicates of control sequenced with two replicates each for two (T. factor) treatments. Therefore, a total of 6 fastq files resulting from the same lane.
My question is, how do I normalize the data for a comparison of each treatment with control in a scenario where there are about 70 million and 8 million reads for rep1 and rep2 of first treatment and 2 million and 5 million reads for controls. I am not sure about the total number of reads in the second treatment. I have pasted the stats of bowtie2 output.
T1:
8077966 reads; of these:
8077966 (100.00%) were unpaired; of these:
2270927 (28.11%) aligned 0 times
2605491 (32.25%) aligned exactly 1 time
3201548 (39.63%) aligned >1 times
71.89% overall alignment rate
T2:
70910425 reads; of these:
70910425 (100.00%) were unpaired; of these:
32129717 (45.31%) aligned 0 times
18056752 (25.46%) aligned exactly 1 time
20723956 (29.23%) aligned >1 times
54.69% overall alignment rate
C1:
5435992 reads; of these:
5435992 (100.00%) were unpaired; of these:
1252404 (23.04%) aligned 0 times
1898388 (34.92%) aligned exactly 1 time
2285200 (42.04%) aligned >1 times
76.96% overall alignment rate
C2.
2755776 reads; of these:
2755776 (100.00%) were unpaired; of these:
2129160 (77.26%) aligned 0 times
277810 (10.08%) aligned exactly 1 time
348806 (12.66%) aligned >1 times
22.74% overall alignment rate
Should I just go about merging sorted bam files of each replicate and use as MACS input? OR analyze each replicate individually? I did the later and the difference was about 50 peaks for one and 400 peaks for another. I am not sure if I should trust the analyses.
Other option is to normalize all three samples, C, T1, T2 together and maybe look for a coordinate regulation between T1 and T2 with respect to C. But my main concern is normalization in such a manner that each treatment can be compared to control for direct targets.
Thanks for suggestions and ideas. :)
P.S.: I played no role in designing this experiment ;P . The biologists have no clue as to why they did this.
Try to look at your samples using CHANCE. It shows similarity between the replicates and QC in general.
If you have replicates, look at these two software: