I'm trying to figure out what tools most people use for differential chromatin detection among samples (condition1 vs condition2). I normally use diffReps for differential detection for ChIP-seq but haven't done it for ATAC. Any suggestions?
I'm trying to figure out what tools most people use for differential chromatin detection among samples (condition1 vs condition2). I normally use diffReps for differential detection for ChIP-seq but haven't done it for ATAC. Any suggestions?
Call the peaks, by merging all the replicates (bam files), with very less stringent criteria ( p-value of 0.1 ) such that you end up with union of regions ( peaks ) across two biological conditions, that have some signal enrichment.
Now either use the entire peak or center the peaks on summit with +/- 200bp and count the number of reads with in each peak from all replicates independently using featureCounts. ( You can easily covert the above bed
file to saf
format )
awk -v OFS = "\t" 'BEGIN { print "PeakID", "Chr", "Start", "End", "Strand" } { print "Peak_"NR, $1, $2, $3, "." }' test.bed > test.saf
Feed this matrix to DESeq2 or edgeR and get differential peaks. Do some filtering of peaks like removing peaks that have very few counts across replicates, otherwise this will be a problem for multiple testing correction to include so many peaks .
Edit:
Most of the differential binding tools run either DESeq or edgeR in the back end and they are suitable for TFs, as the signal is sharp with out any noise.
But with ATAC, the peaks have both characters of TF and histone modifications.
I would
1. Run macs2 with p-value of 0.1
2. Use the narrowPeaks file, either the entire peak (Better)or take +/- 200bp from the summit (only if peaks called with --summit option in MACS, so that broad peaks are split into multiple sub-peaks).
3. Quantify peaks using featureCounts.
4. Feed the matrix to DESeq2.
You should also upload this file ( from step 2 ) to browser to get a feel of how the regions that you are using for DE analysis looks like. Most likely they will be okay.
This is a great idea. Did you try this method over some published differential chromatin tools? (ex;Diffbind or diffReps). This seems like a similar method to what DiffBind does, however in the past I have noticed peak independent differential detection tools have performed better with my data. Just curious if you've looked into it?
Its not because ATAC is more noisy, Its because the open chromatin is not very dynamic unless the cell environment changes drastically. If you compare the ATAC profiles between two different cell types, you will see differential open chromatin regions but may not with in the same cell type exposed to different environments. It could be pure biological, as ATAC measures openness of chromatin but not the binding ( as what you expect in Chip-Seq experiments ).
I think that you have to learn about design formula and design matrices, that DESseq2 and other packages use to describe experimental designs like yours. The topic is covered in the DESeq2 manual and many posts on the Bioconductor support forum, and is a bit too complex for me to introduce it briefly here.
So I started to draft up a response and then thought it might just be easier to send the data. I'll provide a link to a correlation heat map and MDS plot from edgeR. The problem that we have had with our data is that our sequencing depth has varied pretty substantially so this could have some affect (In the link will also be read depth plots). Also, I should add that there are two replicates for each sample but these are TECHNICAL replicates from the same dog. When it comes to the factor we are running differential analysis on, it's gonna be based upon many factors so these dogs are just normal domestic dogs that we are gonna check differences in sex, bread, etc.
Hi geek_y! I am still a bit confused about how to call differential peaks with DESeq2 since after performing featureCounts I have got several matrixes and the rows of each matrix will not be the same. How can I combine these matrixes into one when DESeq2 only take one input?
I believe before doing featureCounts, you will somehow have to merge the peak calls. So that you end up with a set of "query" genomic regions. Then do featureCounts for these "query" regions across samples. This will make it easier to combine counts from all samples in the end... ready for DESeq2.
Once you have your peaks for each condition, you can try DAStk, a tool recently put out for that purpose:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You may have a look at this publication: Nfib Promotes Metastasis through a Widespread Increase in Chromatin Accessibility (doi:10.1016/j.cell.2016.05.052). They use ATAC-seq to compare chromatin accessability between metastatic cells and its solid tumor-of-origin and use DESeq2.
I'll check it out thanks