I do not understand how to get total peaks from MACS2 shown to me. What mistake(s) am I making here?
1
0
Entering edit mode
8 months ago
Ronin ▴ 10

Hi all,

I am trying to compare peak files between Homer and MACS2. My goal is to compare as integers the number of peaks between the two softwares. Retrieving the total number of peaks through Homer is totally simple -- if I look at the peak file generated from the following code:

findPeaks /scratch/ATACseq/FASTQ_ATAC_43955/sortedbams/picard/01_homer_tsv_files -o /scratch/ATACseq/FASTQ_ATAC_43955/sortedbams/picard/ATAC_01_mapped.sorted.peak.file -gsize 9.58e8 minDist 150 -region

I get this:

head ATAC_01_mapped.sorted.peak.file

# HOMER Peaks
# Peak finding parameters:
# tag directory = /scratch/ATACseq/FASTQ_ATAC_43955/sortedbams/picard/01_homer_tsv_files
#
# total peaks = 17394
# peak size = 208
# peaks found using tags on both strands
# minimum distance between peaks = 416
# fragment length = 208
# genome size = 958000000

So getting total peaks is more or less immediate. Easy!

However, when I try to view the number of peaks produced by MACS2 using this:

macs2 randsample -i ATAC_01_mapped.sorted.rem_dup.bam -f BAMPE -p 100 -o ATAC_01_macs2.bed


macs2 callpeak -f BEDPE --nomodel --shift -37 --extsize 73 -g 928200000 -B --broad --keep-dup all --cutoff-analysis -n ATAC_01 -t ATAC_01_macs2.bed --outdir /scratch/ATACseq/FASTQ_ATAC_43955/sortedbams/picard/macs2 2> ATAC_01macs2.log

I see something like this in the log:

head -n 20 ATAC_01macs2.log

INFO  @ Thu, 07 Mar 2024 17:50:43: 
# Command line: callpeak -f BEDPE --nomodel --shift -37 --extsize 73 -g 928200000 -B --broad --keep-dup all --cutoff-analysis -n ATAC_01 -t ATAC_01_macs2.bed --outdir /scratch/ATACseq/FASTQ_ATAC_43955/sortedbams/picard/macs2
# ARGUMENTS LIST:
# name = ATAC_01
# format = BEDPE
# ChIP-seq file = ['ATAC_01_macs2.bed']
# control file = None
# effective genome size = 9.28e+08
# band width = 300
# model fold = [5, 50]
# qvalue cutoff for narrow/strong regions = 5.00e-02
# qvalue cutoff for broad/weak regions = 1.00e-01
# The maximum gap between significant sites is assigned as the read length/tag size.
# The minimum length of peaks is assigned as the predicted fragment length "d".
# Larger dataset will be scaled towards smaller dataset.
# Range for calculating regional lambda is: 10000 bps
# Broad region calling is on
# Paired-End mode is on

INFO  @ Thu, 07 Mar 2024 17:50:43: #1 read fragment files..

So it's embarassing to ask, but I really don't see anything whatsoever that easily displays the total number of peaks displayed by MACS2. Is there a command option I am missing, or am I making some silly mistake in what file I should be viewing (which is what I suspect is the case)?

Thank you in advance!

ATACseq MACS • 730 views
ADD COMMENT
1
Entering edit mode
8 months ago

Just count the number of lines in one of the output files of macs2 callpeak, e.g. in .narrowPeak you can use wc -l ATAC_01.narrowPeak. If I remember correctly, narrowPeak does not have header. If you use a file with header or comment lines (e.g. the .xls file) make sure you omit those lines from counting.

Having said that, intersecting peak files is a tricky operation, especially if you compare different programs. Depending on which cutoffs you choose, you can get considerably different number of peaks even from the same program. In fact, I prefer to think of peaks as continuos entities rather than as presence/absence.

ADD COMMENT
0
Entering edit mode

This worked just fine for me, thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 2119 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6