I'll soon be sequencing my ChIP samples of point-source transcription factors (TF) that I believe have a average to high number of binding sites throughout the genome compared with the "average TF". I am currently studying a organism, Oikopleura dioica, which has a very small genome for a chordate: 70Mb.
The ENCODE guidelines for point-source TF ChIP sequencing are "a minimum of 20 million uniquely mapped reads" (Landt *et al,* 2012) in mammalian cells and a tenth for worms and flies, per factor (combining replicates). For a human genome that would be a coverage of 0.6 or 1.2, and worm 2 or 4 depending on using a 100 or 200 bp read length (they don't specify on the paper).
If I am to have a say, 8X coverage for the samples of my organism (70Mb), I'd need to sequence 2.8 million 100bp PE reads (or have that amount mappable, but let's simplify for now) - a total output of 560Mb.
My samples are going to be run in a facility that has a Illumina Hiseq 2500. This instrument has a output capacity of around 150 million reads per lane, that's 30 Gb with 200 cycles (100bp PE). If I only use 2.8 million reads, they'd be able to fit more than 50 samples on a single lane of the instrument using multiplexing. I'll have about 9 samples only for now and I know there are some other small genome samples being sequenced around the same time, but the machine is mostly used with mammalian cells.
Concerning the sequencing depth for my samples: is my general reasoning correct? I am right to make the transposition between organisms based on coverage? Concerning how to manage the sequencing: what's the ideal way of handling this? Should I sequence more of my samples to avoid being on a cue for other small samples?
Thank you.
I keep my reference for S.c ~ 5mil.