Hi all,
I'm interested in learning more about best practices for merging RNA-seq technical replicates. I've read many Biostars posts on the matter, but I have a somewhat special case.
Background:
I sent RNA samples for sequencing, which were split across 4 lanes each as per normal practice. However, the sequencing depth for this run was much lower than expected and the sequencing core re-sequenced the same samples. Thus, I now essentially have 8x technical replicates per sample, 4x lanes from each of two runs. The repeat runs achieved much better depth.
Questions:
- In this case, should the lower depth data be discarded, or can these data still be used in combination with the updated runs?
- If combined, is there a need to mitigate batch effect? Aside from read depth, the samples have nearly identical statistics with respect to FastQC analysis, % genome alignment individually with STAR, etc.
- For combining, what stage is most appropriate? Aside from file sizes, is there any difference between merged .fastq files and merged .bam files? What about at the level of raw counts? In the past, I have merged .bam files from different lanes and found that the effect was summing the raw reads per gene between replicates.
they are exactly the same samples? or even the left-over lib-prep of the first run?
Great question! Input RNA isolate is identical, though I am unsure if the library prep was repeated between runs. I'll ask and let you know ASAP.