Hi,
I'm attempting to do differential accessibility analysis on ATAC-seq paired-end data. I used Generich for peak calling, using ATAC-seq mode, which calls the peaks on the Tn5 cut sites. Afterwards, I tried to count the reads from my bam files within the regions of the consensus peaks and proceed to do differential analysis with edgeR. I have the doubt of how accurate this could be, as the intervals used for peak calling would be centered at the ends of the reads, so I guess the called peaks would not fully match with the pileup of reads from my bam files. Any ideas on how to address this? Should I be shifting the reads from my bam files?
Here is a small peace of code I'm using for counting reads from my bam files (bamFiles) in the peak ranges called by Genrich (peakList), in case that it's of any help:
# DEFINE CONSENSUS PEAKS
consensus <- soGGi:::runConsensusRegions(GRangesList(peakList), method = "none")
# DEFINE READ PARAMETERS
blacklist <- rtracklayer::import("/ATAC_blacklists/ENCFF001TDO.hg19.bed")
standard.chr <- paste0("chr", c(1:22, "X", "Y"))
param <- readParam(max.frag=1000, pe="both", discard=blacklist, restrict=standard.chr)
# COUNT READS
peak.counts <- regionCounts(bamFiles, consensus, param=param)
I have not looked at the details of your code at all, but have you checked out the
DiffBind
package that provides a wrapper forDESeq2
andedgeR
based analyses for ATAC-seq/ChIP-seq samples?Thank you for your reply. I haven't tried it yet, but I think my concern would be the same. Do you normally use DiffBind to analyse differential accessibility of peaks called by Generich on ATAC-seq mode, without the need to shifting your bam file reads to the Tn5 cut site? Maybe I'm just worried about something that is not a real problem