Question

Batch and combining reads from two flow cells for the same ChIP/ATAC-seq library

0

Entering edit mode

7.1 years ago

reskejak ▴ 40

I'm sure this has been answered previously, but I have been unable to find a directly-similar question here for ChIP-seq (or similar) data.

We have a library which has been sequenced twice, on two flow cells, corresponding to one biological replicate sample from an experiment (each flow cell is an independent experiment). This library has the lowest complexity output from the group of samples assayed, so we would like to combine reads from the two flow cells.

Qualitatively, we have observed that reads from both flow cells appear highly similar (in terms of mapping stats, duplicates, complexity, signal-to-noise, FRiP, peaks called, UCSC genome browser, etc.) and plotting PCA with DiffBind shows these samples to be nearly identical with or without modeling batch.

Can we directly combine these reads by way of cat or similar? What about combining bam files as in samtools merge? How else can we assess the appropriateness of combining these libraries, and furthermore which would be the most practical method to doing so?

ATAC-seq ChIP-Seq Illumina batch combine • 3.3k views

ADD COMMENT • link updated 7.1 years ago by Devon Ryan 105k • written 7.1 years ago by reskejak ▴ 40

score 5 · Answer 1 · 2018-06-20

As a general principal, if you just sequenced the same library twice then go ahead and merge the results (it doesn't much matter if you cat the fastq files together or use samtools merge on the BAM files, you'll get essentially the same results regardless).

In your particular case you've also nicely assessed the comparability of the technical replicates. That's great, well done! There's no hard and fast rule for exactly how similar technical replicates need to be to merge them. But if you look at the QC metrics you created and the runs look as similar as yours sound then you can safely merge the runs.