In conjunction with my colleague, Thomas Carroll, who has been looking at quality metrics for ChIP-seq libraries on a large scale (including evaluating more than 400 ENCODE libraries), we have developed ChIPQC, which is included in this week's Bioconductor 2.14 release.
ChIPQC offers a straightforward way to generate an interactive QC report for a sample or a set of samples associated with an experiment. You can check out a report for an example experiment with 11 samples and 5 controls (or a PDF image of this report)
When ChIPQC is supplied with one or more .bam files, it computes a number of metrics relating to the alignments, including duplication rates, mapping quality filtering rates, distribution of pileups (and the associated SSD measure), and an estimated fragment size based on maximizing a relative cross-coverage score. Specifying a genome annotation (human, mouse, rat, fly, and worm are directly supported) enables calculation of genomic profiles, showing enrichment and depletion at various genomic features. Blacklists can also be used for filtering. Finally, supplying one or more sets of peaks enables computation of enrichment of reads in peaks, mean binding profiles around peaks, and clustering/PCA plots of how samples in an experiment are correlated based on peak signals.
The package uses BiocParallel for parallelization, and analyses may be limited to a specific subsets of the chromosomes for faster processing. The package vignette includes several examples with code included.
For those who may already be using the DiffBind package, these two packages work hand-in-hand, using the same samplesheets. If you already have an experiment loaded in DiffBind, you may pass the DBA
object directly to ChIPQC and generate a report in two steps, eg:
expQC = ChIPQC(DBA, annotation="hg19")
ChIPQCreport(expQC)
One you have a ChIPQCexperiment
object, you may use this in DiffBind to analyze peak overlaps and perform differential binding analysis automatically using the edgeR and/or DESeq2 packages, eg:
expQC = dba.count(expQC)
expQC = dba.analyze(expQC, method=c(DBA_EDGER,DBA_DESEQ2))
This initial release of ChIPQC includes a baseline of functionality, with a focus on making it easy to generate a fairly comprehensive report quickly. Tom and I intend to continue to develop and support this package so long as there is interest, so please do report any issues you may have, and forward suggestions of what you think would be useful in a ChIP-seq quality assessment package.
Thanks
Rory
Thank you for developing this package. I am going to run ~100 ChIP-seq QC, what's the memory footprint?
Will you add IRD to the assessment of replicate qualities?