Entering edit mode
5.6 years ago
ben.kunfang
▴
30
Hi,
The data I use is ENCFF424GON. When I use ENCODE ChIP-seq pipeline on DNAnexus and use SPP(xcorr) to calculate the estimated fragment length, it gives me 140bp, however, when I use macs2 predictd function with parameter -g hs -m 5 50, it gives me 274bp. I try several mfold combinations but no one close to 140bp. I just wondering why there is so much difference between these two algorithms. It seems both of them use cross-correlation method to decide the estimated fragment length but the results are not even closed.
Thanks in advance! Kun
Difficult to answer. I would argue though that in the end it will barely make a difference which length you use for the analysis as both results reflect short and acceptable fragments for a normal ChIP(-seq) experiment. There is also a method in the
csaw
package (see the manual at Bioconductor) for fragment length estimation and code to plot the result that might be worth looking at. Maybe the fragmentation did not produce a clear "summit" in terms of length and you have fragments more or less evenly distributed between 150 and 300bp, so summit identification forxcorr
is difficult. Again, I don't think it matters a lot. If you read the library prep protocol, you might also simply use the average length they provided there. Typically one aims for a sonication/Fragmentation length between 150-300bp.Thanks for your reply! I tried csaw, and it indeed has two local peak one around 140 one around 280. Two algorithms might have different thresholds to select the local peak.
You can also take the mean of the two sub-peaks. As said, I really don't think it matters for both peak calling and differential analysis.
Good idea~but I think the estimated fragment length indeed affects the position of the peak. I used two fragment lengths separately to call the peak(macs2) and intersect the narrowPeak file. 88627/112435 are overlapped, which mean 25% of peaks are in different regions. In this case, I might not say it doesn't matter. Thanks!