I'm sure this has been answered previously, but I have been unable to find a directly-similar question here for ChIP-seq (or similar) data.
We have a library which has been sequenced twice, on two flow cells, corresponding to one biological replicate sample from an experiment (each flow cell is an independent experiment). This library has the lowest complexity output from the group of samples assayed, so we would like to combine reads from the two flow cells.
Qualitatively, we have observed that reads from both flow cells appear highly similar (in terms of mapping stats, duplicates, complexity, signal-to-noise, FRiP, peaks called, UCSC genome browser, etc.) and plotting PCA with DiffBind shows these samples to be nearly identical with or without modeling batch.
Can we directly combine these reads by way of cat
or similar? What about combining bam files as in samtools merge
? How else can we assess the appropriateness of combining these libraries, and furthermore which would be the most practical method to doing so?