Why the estimated fragment length result is different between SPP and macs2 predictd?
0
0
Entering edit mode
5.6 years ago
ben.kunfang ▴ 30

Hi,

The data I use is ENCFF424GON. When I use ENCODE ChIP-seq pipeline on DNAnexus and use SPP(xcorr) to calculate the estimated fragment length, it gives me 140bp, however, when I use macs2 predictd function with parameter -g hs -m 5 50, it gives me 274bp. I try several mfold combinations but no one close to 140bp. I just wondering why there is so much difference between these two algorithms. It seems both of them use cross-correlation method to decide the estimated fragment length but the results are not even closed.

Thanks in advance! Kun

SPP macs2 estimate fragment length • 2.3k views
ADD COMMENT
1
Entering edit mode

Difficult to answer. I would argue though that in the end it will barely make a difference which length you use for the analysis as both results reflect short and acceptable fragments for a normal ChIP(-seq) experiment. There is also a method in the csaw package (see the manual at Bioconductor) for fragment length estimation and code to plot the result that might be worth looking at. Maybe the fragmentation did not produce a clear "summit" in terms of length and you have fragments more or less evenly distributed between 150 and 300bp, so summit identification for xcorr is difficult. Again, I don't think it matters a lot. If you read the library prep protocol, you might also simply use the average length they provided there. Typically one aims for a sonication/Fragmentation length between 150-300bp.

ADD REPLY
0
Entering edit mode

Thanks for your reply! I tried csaw, and it indeed has two local peak one around 140 one around 280. Two algorithms might have different thresholds to select the local peak.

ADD REPLY
0
Entering edit mode

You can also take the mean of the two sub-peaks. As said, I really don't think it matters for both peak calling and differential analysis.

ADD REPLY
0
Entering edit mode

Good idea~but I think the estimated fragment length indeed affects the position of the peak. I used two fragment lengths separately to call the peak(macs2) and intersect the narrowPeak file. 88627/112435 are overlapped, which mean 25% of peaks are in different regions. In this case, I might not say it doesn't matter. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2127 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6