Hi,
I was performing some tests using the peak calling software MACS2 and I stumbled upon an issue: using pair-ended reads as input, I get different results using the same data set, but in BAMPE
format and in BEDPE
format. If I use a data set consisting of single-end reads, I do get the same results using BAM
and BED
. MACS2 was used calling:
/macs2/bin/macs2 callpeak -t ./input.bedpe -f BEDPE -g hs -n bedpe
and
/macs2/bin/macs2 callpeak -t ./input.bampe -f BAMPE -g hs -n bampe
respectively. The conversion from from BAMPE to BEDPE was performed using:
samtools sort -l 0 -n -O bam <input_bampe_file> | samtools view -bf 0x2 | bedtools bamtobed -bedpe -i - > <output_bedpe_file>
Note that the PCR duplicates were removed from the BAMPE
file. Using the BEDPE
format, I get a resulting peaks file of 67711 entries and using BAMPE
, I get 58330 entries. Am I doing something wrong in the conversion between the two formats or did anyone else experience this issue with MACS2? Thank you
Can you look at the .xls file MACS2 produces and see if the number of reads counted is different between your BAMPE and BEDPE files? My guess is that with the BEDPE format you're only using reads which are properly paired to measure fragment size / pileup. With the BAMPE format it's using only the properly-paired reads to measure fragment size, but using all reads to measure pileup.
Hi James. Thanks for the reply. Looking at the xls files generated during the runs:
For
BEDPE
:For
BAMPE
:Even if I convert from
BAMPE
toBEDPE
without thesamtools view -bf 0x2
step which is supposed to convert only properly-paired reads according to the doc then I get 67720 entries in the resulting peak file fromBEDPE
as compared to the previously 67711.Hmm okay I'm not sure, maybe best to post an issue to the MACS GitHub repository (https://github.com/taoliu/MACS/issues) or the MACS Google group (https://groups.google.com/forum/#!forum/macs-announcement). Tao Liu will obviously know more about the inner workings.
I've sent an email to the support team earlier today, let's see if they answer.. thought it might be quicker via biostars as there are more ppl using MACS :]
OK. I've performed a bunch of tests, but still no decent conclusion has been reached. If I use single-end reads as input in MACS2, from both
BAM
andBED
formats I get identical resulting peak files. If I use pair-ended reads inBAMPE
andBEDPE
I get different resulting peak files using the same data set. Line count peak file:using
BAMPE
: 58330using "classic"
BED
created from BAMPE: 90685using
BEDPE
created withsamtools sort -l 0 -n -O bam <input_bampe_file> | bedtools bamtobed -bedpe -i - > <output_bedpe_file>
: 67720using
BEDPE
created withsamtools sort -l 0 -n -O bam <input_bampe_file> | samtools view -bf 0x2 | bedtools bamtobed -bedpe -i - > <output_bedpe_file>
(which keeps only the properly-paired reads) : 67711using not the standard
BEDPE
format, butcut -f 1,2,6 <input_bedpe_file> | sort -k1,1 -k2,2n -k3,3n > <output_bed_file>
: 60998:/ reading through the doc of MACS2 still don't find it clear enough. They use a special BEDPE format that I used in the last test: "The BEDPE format is a simplified and more flexible BED format, which only contains the first three columns defining the chromosome name, left and right position of the fragment from Paired-end sequencing. Note, this is NOT the same format used by BEDTOOLS, and BEDTOOLS version of BEDPE is actually not in a standard BED format. " Any idea why?
Hi,
did you ever got to the bottom of this problem? Kinda fighting same thing here. I did find this answer by taoliu https://github.com/taoliu/MACS/issues/183 where he points out that macs2 does read selection that can be simulated using: samtools view -b -f 2 -F 4 -F 8 -F 256 -F 512 -F 2048 How ever still I'm getting different results using BAMPE or BEDPE, I've found that macs estimates different fragment lengths for each file format.
Hi,
Have you sorted this out? I ran into the same problem. MACS2 BAMPE and BEDPE gave dramatically different "mean fragment size" I currently just used BEDPE, as the fragment size in BAMPE seemed mistakenly small to me.