Question

Trying to understand warning from MACS2 about too few paired peaks and differing results in DiffBind

0

Entering edit mode

10 months ago

atan • 0

Hi, I'm seeking some insight into what might be behind very different DiffBind results when my ATAC-seq samples are aligned to different copies of my genome using MACS2. Theoretically, they should be the same version of the species' genome, but one is downloaded from NCBI (which would be helpful because it has a gtf that I need for RNA-seq work) and the other is from Echinobase.

I'm calling peaks with:

for FINAL in C* 
do 
macs2 callpeak -n $FINAL -g 9.218e+08 -f BAMPE --outdir $OUTPUT_DIR -t $FINAL/*-final.bam 
done

When I take the resulting peak files into DiffBind and run dba.count, I get differing results depending on whether the input bams were aligned to the NCBI or Echinobase versions of the genome.

With the NCBI genome--I only get 2,300 sites in the count matrix and the FRiP scores are abysmal (0.01).

With the Echinobase genome--the count matrix has almost 70,000 sites and the FRiP scores are much better.

Anything particular I could look for to determine why the two genome downloads are giving such drastically different results? I'd like to come up with an actual explanation, if I can, rather than just going with the genome that appears to give "better" results, especially since using the Echinobase genome complicates steps in my RNA-seq work!

MACS2 ATACseq DiffBind • 631 views

ADD COMMENT • link 10 months ago by atan • 0

score 0 · Answer 1 · 2024-05-23

0

Entering edit mode

10 months ago

jared.andrews07 ★ 18k

Without looking at the actual sequence for each genome, my guess is that the Echinobase reference just matches much more closely to the organism you actually sequenced. I expect the mapping quality and number of mapped reads when aligned to each reference would answer this pretty quickly.

ADD COMMENT • link 10 months ago by jared.andrews07 ★ 18k

0

Entering edit mode

Ah, thank you! You're right--I hadn't noted that in the bowtie2 report before. With the NCBI genome, the alignment rates are around 50-70% whereas the Echinobase genome alignment gives around 70-90%. Weirdly, from all the documentation I can view, they theoretically should be the same genome (Echinobase says the genome is generated by NCBI) but they're clearly different based on both a comparison of the two files and the differing alignment results.

ADD REPLY • link 10 months ago by atan • 0