Hello,
I would like to know how you guys address batch effects on re sequence on the same samples (Fastq files).
Our client targeted 20 million reads for all of her samples. However, in the first run, we generated less than 20 million reads for a couple of samples(sample_2,3 and 7). So we re sequenced those samples again.
For the 1st run
sample_id #_obtained_reads
sample_1 21.4
sample_2 11
sample_3 12
sample_4 35.5
sample_5 23.8
sample_6 29.4
sample_7 10
sample_8 23.8
sample_9 24.3
sample_10 18.6
For the 2nd run
sample_id #_obtained_reads
sample_2 9
sample_3 8
sample_7 10
When it comes to downstream analysis, how would you address those samples(sample2, 3 and 7). Would you just merge them? i.g.
cat sample_2.fastq.gz (from the 1st run) sample_2.fastq.gz (from the 2nd run) > sample_2.merged.fastq.gz ?
Or would you visualize PCA or hclustering to see if they cluster together or not, and then decide to drop/merge the samples from the 2nd run?
Thanks, ATpoint. Oh I have another quick question. Does this apply to scRNAseq (10x) data also? Merge sequencing replicates just like bulkRNAseq?
Running the exact same library on the same kind of instrument (assuming no instrument glitch) will not add any technical artifacts.
You absolutely should merge any kind of data with UMIs, because you don't want two molecules from the same cell of the same gene and UMi being counted separately just because they ran at different times.