Hello all,
I am struggling to find references that would guide my experimental design. Specifically, I am trying to determine optimal sample size for a differential ChIP-seq experiment involving farm animals. I would like to investigate differential TF binding in affected vs. unaffected animals. Compounding variables may include age and/or farm differences. Breed will be the same.
The ENCODE consortium recommends a minimum of 2 replicates, but that seems to refer to a simple ChiP-seq experiment defining binding sites, rather than differential binding between two groups.
I have found the following articles, but find they don't address the issue directly:
- Zuo C, Keleş S. A statistical framework for power calculations in ChIP-seq experiments. Bioinformatics. 2014;30(6):753-760. doi:10.1093/bioinformatics/btt200
- Zhao, S., Li, CI., Guo, Y. et al. RnaSeqSampleSize: real data based sample size estimation for RNA sequencing. BMC Bioinformatics 19, 191 (2018). https://doi.org/10.1186/s12859-018-2191-5
- Chung-I Li, David C Samuels, Ying-Yong Zhao, Yu Shyr, Yan Guo, Power and sample size calculations for high-throughput sequencing-based experiments, Briefings in Bioinformatics, Volume 19, Issue 6, November 2018, Pages 1247–1255, https://doi.org/10.1093/bib/bbx061
Any help is greatly appreciated.
Russ
Thanks for your quick answer, Friederike. Agreed re: minimizing confounding factors/noise, but the reality is that I will have to accept some variability in this potential experiment. Will try to source as many samples as possible. I will also likely stick to ENCODE's recommendation of 20 million reads/sample for ChIP (https://www.encodeproject.org/chip-seq/transcription_factor/)
20 mio reads after alignment though, so aiming for 50 mio is probably prudent Not sure how the size of your animal's genome compares to the mouse/human genome, but you def don't want the sequencing depth to be the limiting factor (cost-wise it's probably negligible compared to the cost of the full experiments)