Entering edit mode
3.0 years ago
buffealo
▴
130
Hello,
I want to ask about merging concept of .narrowPeak files generated by macs2. I am merging them with HOMER mergePeaks as I found the most informative one (compared to the bedops, bedtools). However, for the downstream analyses I need other column informations such as peak score. in mergePeaks output files also my STAT column does not contain any value except 0. What can I done for this purpose? Thank you so much.
If you are dealing with technical replicates I suggest you merge the .bam files prior to peak calling, rather than first performing peak calling for each replicate and then merging them.
They are not technical replicates. I am trying to collect same experimental ChIP-seq samples and trying to create a pool.
If you can briefly explain me your experiment design I can give a suggestion. I am not sure what you mean for "same experimental ChIP-seq samples", usually in science it's common to use technical or biological replicates as terminology.
Okay. For example I want to collect ERalpha chip-seq samples with no special treatment in a cell line(they could be control in their dataset) . There could be biological replicates from the same experiment (I mean rep1 rep2.. not technical ones) and also there are other samples with the same condition and in this is case it is only ERalpha control in that specific cell line. I am trying to combine them in order to create a huge ERalpha peak dataset and consequently, ERalpha peak confidence would hightened, especially in the regions that the whole peaks are overlapped (subgroup of the value of the whole sample size), and this is according to my evaluation and my idea.
As long as I understood from your explanation, you are dealing with both biological replicates and non-biological replicates (which are completely another cell line). Am I right? If so, you should absolutely not merge either .bam files or .bed files from peak calling, which is something you can do only if you deal with technical replicates.
A better approach would be to perform overlap among peak sets from different sample groups (with
bedtools intersect
, for example), choosing an appropriate threshold for overlap (I would suggest 50%,-f 0.50
), then extracting the overlapping regions and explore them for further analysis.Actually, what I mean is samples are all for the same cell line and same condition (eveything is same except that they are coming from different experiments). It is like collecting all of the same conditioned samples available in a database. That's why I want to merge them somehow. But as I explained that they are coming from differend labs(they have different inputs for example), I want to merge - combine them after the peak calling step, and I want to proceed to further analysis with the huge consensus dataset I have obtained. But with bedtools or bedops, some of the the column information that MACS2 generated, are lost. Only chromosome number and locational interval (simple BED file) is given as the output. However, I need something in .narrowPeak format to conduct further anaylsis. The most detailed one I could find is HOMER mergePeaks, however, it is not also giving the output in .narrowPeak format.
I suggest trying to add the argument
-wa
withbedtools intersect
, which write the original entry in A for each overlap. It should write in the output all the original columns of .bed file A, therefore keeping the .narrowPeak format. At least it works for me. Let me know.