Hello all,
New to atac-seq and am trying to wrap my head around what exactly happens during differential peak analysis. So my understanding is that a counts file for peaks that are common between the two samples gets assembled and then a log fold 2 change value is calculated and whatever peaks above x threshold fold change are analyzed as significantly different between the groups. So if there are for peaks common between the samples being compared, what about the peaks that are not common (maybe new areas of chromatin accessibility that gets opened up due to a new experimental condition) and their related genes? Does that get analyzed in atac seq? Is it worth analyzing? If so, how would one go about doing so?
Thanks.
The count matrix should representative peaks that were detected in any of the conditions. A peak (that is a regulatory element for example) can be active in one, but inactive in another condition. You would lose this information by subsetting to common peaks only.
There are a couple of ways to achieve this. Either you pool your bam files and call peaks on the combined dataset. Or you cann peaks on each bam individually and keep only peaks called in at least X samples or Y percent of samples, or Z percent of samples per group. Or you call peaks per group individually. In the end you would merge the peaks and then get counts over these intervals. That is input to your differential expression. The choice of strategy for such scenarios has been formally investigated here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4066778/. I most of the time call peaks per group by calling peaks individually per sample, and keeping peaks that are present in at least X percent of samples in at least one group. Later merge the lists of peaks from each group. Depends on the analysis goal.