Question

Merging chip-seq samples sequenced in two different runs

0

Entering edit mode

4.6 years ago

srhic ▴ 70

Hello,

I have a chip-seq experiment in which the number of read is rather low (~10M per sample). However, most qc parameters show that the chip efficiency is not bad. So we have decided to resequence the same samples again. My question is what would be the best approach to combine the new data with the existing one? Would I just treat the new data as a technical replicate and merge the fastqs/bams together? Or would I treat it as a biological replicate and account for batch effect when using edgeR for differential analysis between my conditions?

(also a sort of unrelated question I have is that I have noticed that in most of my chip-seq experiments, the input sample always has more reads than the treatment samples and I was wondering if this is normal or just something random).

Thanks

ChIP-Seq RNA-Seq • 1.6k views

ADD COMMENT • link updated 4.6 years ago by Carlo Yague 9.0k • written 4.6 years ago by srhic ▴ 70

score 3 · Accepted Answer · 2021-02-18

I usually consider re-sequencing of the same sample (same library prep) as a technical replicate and merge the bam files once I have confirmed that the replicates are technically sound. For instance, for ChIP-seq, I would look at some control peak and assess if the replicates behave similarly. If they don't, this certainly raises a flag and I would not merge them unless there is a good reason for the difference.

Concerning having more reads in the input vs IP, I guess it totally depends on how you pooled the barcoded libraries before sequencing. If you aimed for an equimolar pool, and have more reads in your input, then this likely reflects library prep quantification issues or adapter contamination in your IP – which happen more frequently when the IP efficiency is low. That being said, nothing stops you from stepping away from an equimolar pool and mixing more material from the IP than from the input. After all, the IP reads are usually more informative than the input reads (PS : I'm not saying that the input control is not important here).