Hi, I'm seeking some insight into what might be behind very different DiffBind results when my ATAC-seq samples are aligned to different copies of my genome using MACS2. Theoretically, they should be the same version of the species' genome, but one is downloaded from NCBI (which would be helpful because it has a gtf that I need for RNA-seq work) and the other is from Echinobase.
I'm calling peaks with:
for FINAL in C*
do
macs2 callpeak -n $FINAL -g 9.218e+08 -f BAMPE --outdir $OUTPUT_DIR -t $FINAL/*-final.bam
done
When I take the resulting peak files into DiffBind and run dba.count, I get differing results depending on whether the input bams were aligned to the NCBI or Echinobase versions of the genome.
With the NCBI genome--I only get 2,300 sites in the count matrix and the FRiP scores are abysmal (0.01).
With the Echinobase genome--the count matrix has almost 70,000 sites and the FRiP scores are much better.
Anything particular I could look for to determine why the two genome downloads are giving such drastically different results? I'd like to come up with an actual explanation, if I can, rather than just going with the genome that appears to give "better" results, especially since using the Echinobase genome complicates steps in my RNA-seq work!
Ah, thank you! You're right--I hadn't noted that in the bowtie2 report before. With the NCBI genome, the alignment rates are around 50-70% whereas the Echinobase genome alignment gives around 70-90%. Weirdly, from all the documentation I can view, they theoretically should be the same genome (Echinobase says the genome is generated by NCBI) but they're clearly different based on both a comparison of the two files and the differing alignment results.