Hi there,
We are looking into ATACseq and HiC sequencing of some human derived samples. Does anyone has any recommendation about sequencing depth? e.g. how many samples to pool per lane on Illumina HiSeq. Have been looking around but no real data is present on this. Also is there some best practices about the analysis on these techniques? Perhaps worth to put a chapter in the new biostarsbook?
There are many studies out there using ATAC and HiC. You can check the sequencing depth people usually go for.
Best way to sequence ATAC is 50bp PE and usually 60 million reads. If you are interested in generating the TF footprints, you might need hundreds of millions of reads. No idea about Hi-C depth. In general, you might need to get at least 100 million mapped and valid pairs at the end, which might require a sequencing depth of 300-400 million PE reads.
No clue about Hi-C, but for ATAC-seq it really depends on the study's aim. What do you plan to do? For peak calling and differential expression, people typically aim for 25-30mio reads after all filtering, and at least a duplicate. It really depends on the percentage of mitochondrial reads in the sample, which (if you do not deplete) can by up to 80%. Also, the complexity per library is limited, so sequencing (in my experience) far beyond 30mio filtered reads will mainly pick up duplicates, so additional replicates are necessary for increased complexity. Towards analysis, it is typically treated like ChIP-seq. Call peaks with the MACS (or any tool of choice, disabling any shifting model), make a consensus peak list over all conditions and get a count matrix, which then goes into DESeq2 or similar frameworks. There are some tools from the Greenleaf lab (chromVAR) for linking TF motif presence with chromatin accessability and NucleoATAC for nucleosome position calling, but especially the latter (as far as I understood the paper, which demonstrates the technique using the (tiny) yeast genome) requires quiet many replicates and deeply sequenced samples.
HiC will depend on the cutter you end up using.
Was thinking about this method HiC2 Very clear paper and protocol but no guidelines on sequencing depth... Just one sentence "Datasets from Rao et al. were selected solely based on their read depth that was comparable to datasets obtained with Hi-C 2.0 (100–200 million reads)"
It also depends on the resolution of the Hi-C domains you want to detect. According to this guide, 100mio filtered reads are sufficient for a resolution of 40kb.
Great thanks all for your help, much appreciated! Got one more additional questions, is there any need to PhiX spike-in during HiC or ATAC sequencing? or can the libraries just go on hiseq without issues?