Question

Different number of peaks between biological replicates?

0

Entering edit mode

5.8 years ago

rohitsatyam102 ▴ 940

Hi All

I did ChipSeq analysis using both Bowtie2 and BWA mem. The peaks were called taking IGG as control. After calling peaks I see one trend, Replicates 2 (marked by -2) don't have even half the number of peaks as compared to replicate 1 (marked by -1).

This is weird because biological replicates must have relatively equal number of peaks

Alignment with BWA
Sample No. of Peaks

HTH3 87299
HTK27AC-1 11196
HTK27AC-2 428
HTK27ME3-1 35341
HTK27ME3-2 10286
HTK4ME1-1 93845
HTK4ME1-2 1420

Alignment with Bowtie2
Sample No. of peaks

HTH3 17944
HTK27AC-1 6259
HTK27AC-2 465
HTK27ME3-1 20044
HTK27ME3-2 7773
HTK4ME1-1 9600
HTK4ME1-2 761

Is it normal to see this trend. I referred to the literature bit I see people mention more common peaks between biological replicates and at the same time relatively same no. of peaks.

sequencing ChIP-Seq replicates • 3.0k views

ADD COMMENT • link updated 5.8 years ago by colin.kern ★ 1.1k • written 5.8 years ago by rohitsatyam102 ▴ 940

score 4 · Answer 1 · 2019-10-31

This is weird because biological replicates must have relatively equal number of peaks

No, they don't but probably should. Raw peak numbers are strongly influenced by immunoprecipitation efficiency, signal-to-noise ratio and sequencing depth. Not unusual to get quite different numbers between replicates. That is exactly why I personally find it pointless to compare raw peak numbers. The H3K27ac in my experience is especially problematic as it always (edit: in the datasets I've seen from primary specimen) gives rather poor quality data. H3K4me1 is typically better (=more specific in terms of signal/noise ratio). In your second replicate you might have had issues with crosslinking efficiency, cell viability, antibody coupling efficiency, lot of possible reasons. ChIP is always a problematic experiment as it is so antibody-dependent. I always roll my eyes when I see papers making statements like "condition 1 shows 30% more peaks than condition 2". Statements like this should be based on a proper differential analysis. That means merge all peaks, create a count matrix for all conditions and then feed this into tools like DESeq2 or edgeR. If then you see that you get significantly increased counts for one conditions over the other in a notable number of peaks you can make statements. If not, any fluctuation of peak numbers might be a function of IP efficiency, depth, or peaks might be small and spurious. Especially the latter is important if data quality (e.g. due to antibody efficiency) is an issue. A sample with slightly better quality might get more peaks at borderline significance while a sample with reduced quality might not. This is still not too informative about the actual biology. It only (in my very humble opinion) matters if these initial observations suvive a proper analysis that takes into account dispersion between replicates etc. after a meaningful normalization of read depth and library composition.

score 2 · Answer 2 · 2019-10-31

What if you merge the .bed files, and quantify counts within the peaks?

Do you get a high correlation in the quantifications?

In your case, I think there is some bigger difference (for HTH3), but that is what I would probably check if the peak counts were similar but in different positions (like for the other antibodies).

For ATAC-Seq data, I found it helpful to use the --local option to increase the alignment rate. You can also run Picard and get an idea if the insert distribution looks different with the different alignments.

However, for ATAC-Seq data, the alignment rate was very different for default BWA-MEM and Bowtie2. Is the alignment rate also different for your samples, or do you have a similar alignment rate and a different number of peaks?

Also, for Histone modifications, I used the HOMER findPeaks with -style histone . However, I don't think it changed things as much as you described (for HTH3).

Finally, is your total reads similar in your replicates? That can also have an effect on the number of peaks called (and ATpoint also mentions read depth and read count comparisons).

score 0 · Answer 3 · 2019-11-04

Is this data from cell lines or frozen tissue? ChIP-seq can be a very hard assay to get consistent data from, especially with tissue, because there can be very large differences in the signal-to-noise ratio even when following the exact same protocol. Seeing that it is consistently the second replicate that has much lower peak numbers, it could be a difference in how well the chromatin fixation and shearing step worked, assuming the IPs for all the marks were done from aliquots of the same shearing product. Otherwise it may be the actual tissue sample is more degraded for replicate 2.