Hi everyone!
I'm trying to analysis two single cell sample (same tissue and 1 from case, 1 from control) with the seurat tutorial.
And I got confused about some QC steps:
Why do we need to 'remove cells with less than 5% ribosomal reads' ?
Should I run 'Predict doublets' before merging these two datasets? And can I do this step after merging and filtering mito/ribo genes steps?
Wondering if I need to remove batch effects with harmony(or other methods) before doing cell type prediction/differential expression analysis? (I thought batch effects should be removed if I have biological replicates, like between control_1 and control2 ? )
Thank you!
1.) to be clear, since this isn't well defined in the Seurat tutorial that you linked, the ribosomal content that that tutorial is referring to is ribosomal protein mRNA, not rRNA. Ribosomal protein transcripts should be sequenced and should be relatively abundant and thus a may serve as a sanity check on how well a cell's transcriptome was captured, but the argument for filtering cells bases on these transcripts is less well defined in my opinion as it is for filtering on mitochondrial transcripts.
2.) I think it makes the most sense to predict doublets before integration, especially with tools that create artificial doublets by combing two cells in your data set as a part of the prediction algorithm. You want the artificial doublets to most accurately reflect real possible doublets in your samples and that is most feasible at the single sample level.