Using the same paired-end data, MACS2 BAMPE mode gave mean fragment size of 111bp, while BEDPE mode gave mean fragment size of 259bp.
To produce bedpe files from BAM files, I followed one previous thread in the MACS2 GitHub repository: https://github.com/taoliu/MACS/issues/183
Basically, I used "samtools view -b -f 2 -F 4 -F 8 -F 256 -F 512 -F 2048" to select reads first, and finally cut 3 columns of bedtools ouput to make the bedpe files specific to MACS2.
I also used "Picard collectinsertsizemetrics" to check insert size distribution, the median of which is very close to the "mean fragment size" in BEDPE mode. The size in BAMPE mode seemed mistakenly small.
I am using MACS2 2.1.1.20160309 installed with miniconda.
I also posted the same question (along with two log files) in the MACS2 repository at GitHub ( https://github.com/taoliu/MACS/issues/217 ), but have not gotten any feedback. I guess there are more MACS2 users in Biostars. So I re-post the question here, just to make sure I am using MACS2 correctly.
I noticed mcjmigdal in this thread (MACS2 produces different results using BAMPE and BEDPE) also ran into this problem. Hope we can sort this out.
Many thanks, Xinwei
This appears to be a very specific issue that, possibly, the developer will have to address. Like you, I have seen others with the same issue, but there does not appear to have been a solution on any forum.
Have you tried things like re-aligning your data and only including reads that: are properly paired align uniquely in the genome ?
...basically just including the best quality data in your BAM file to see if that improves the situation. With Bowtie, you can ensure unique mapping. In fact, I always switch that on for ChIP-seq data by supplying the
--best -m 1
parameters to Bowtie.Thanks for sharing! I used bowtie2 for mapping these data. After mapping, I required mapping quality to be at least 5 (samtools view -q 5), in order to remove multiply mapped reads.
In this thread (https://github.com/taoliu/MACS/issues/183), the developer mentioned the BAMPE mode implements internally something like "samtools view -b -f 2 -F 4 -F 8 -F 256 -F 512 -F 2048", so only properly mapped reads are used by MACS2. It should not matter much how read alignment is filtered in the previous steps.
Currently, I just use BEDPE mode, as the fragment size in this mode is close to "Picard collectinsertsizemetrics".
Yes, that samtools command should only include properly mapped read pairs, and also only retain primary alignments. Strange issue!
apparently, bedtools bamtobed does the same operation under the hood to paired end reads. I didnt see that info somewhere before and just now realized that my "filtered" bam file has 8 million reads but when I convert it to a bed file - I got 7,4 million. The lost reads apparently were the ones that did not have a properly aligned mate. so that additional filtering with those flags (especially 8) solved my issue an now the n reads is the same after converting