Hello, I am a student doing a project with some conceptual difficulties (I do realise such question optimally shouldn't occur). The project aims to compare 8 RNA-seq samples (4 vs 4) testing for differentially expressed genes. The problem is that 2 of the samples (1 for each categories) weren’t amplified as much as the others (an additional 5 cycles).
Obviously, such treatment is sub-optimal for comparative studies, I was wondering what was the best remedy for this problem short of resequencing. I was considering DESEQ2 for it’s outlier capacities or dropping the samples (doing a 3 vs 3) Additional information: the transcriptome is de novo using the samples.
I think that if you remove PCR duplicates and use DESeq2 you shouldn't have a bias
The user then runs the risk of removing actual signal, If you can unequivocally see on a PCA that those two samples are very different from their respective groups, I'd maybe include it as a batch effect in the DESeq2 model
Of course you would have a bias! Removing duplicates would incorrectly decrease the inferred expression level of highly expressed genes, from which duplicates are normal and expected. That is why rmdup is not recommended for RNA-seq (or any NGS assay in which duplicate generation is expected irrespective of PCR duplicates).
but is there in your opinion a correct solution ? Is the batch effect integration an acceptable way in your opinion ?
It's a good solution