I have 25 single cell RNAseq 10X samples processed with CellRanger. Three of these samples appear to be contaminated with high levels of low gene cells which are probably "soup" or loose mRNA. I want to filter out these low quality cells, but I don't want to also remove low expressing cells that I am interested in by raising the nFeature threshold too high.
Another bioinformatician in my lab suggested that I use the CellRanger filtered_feature_bc_matrix for these contaminated samples, rather than the raw_feature_bc_matrix. Is it advisable to use filtered_feature_bc_matrix for some of the samples and raw_feature_bc_matrix for the rest? Or should I use entirely one or the other?
I cannot really speak based on the Cellranger vocabulary since I use a different preprocessing pipeline, but I personnally filter based on four categories using the count matrix that the preprocessing pipeline sees as the "filtered cells" so the one where unreliable barcodes have already been removed.
1) Total number of detected genes (detected = counts > 0 per cell)
2) Total UMI counts per cell
3) % reads pmapping to mt genes
4) % genes mapping to rRNA genes
If you plot each of these categories as a violin for all cells you will be able to define by-eye cutoffs (or automated using 2-3xMAD) that separate outlier cells per category which you can then flag. Removing cells which fails any of these categories is usually sufficient to get rid of crappy cells without being overly stringent in any individual category. The combination of filters compensates for being very strict in each category. If you have low numbers of detected genes this usually comes along with low(er) UMI counts.