After I performed the standard Seurat workflow, I found top expressed markers are ribosomal genes in two clusters in my dataset.
My question is should I remove these two cell clusters and then perform normalization, scale, etc. again on the rest of the cell populations? Or, I can go back to the very beginning, grep ribosomal genes from my top VariableFeatures, remove them, and then perform dimension reduction and clustering with the rest of VariableFeatures? Which method do you think make more sense?
Hello! Thanks for the suggestion! It is our own sample and it's a PDX tumor sample actually. Interestingly, in these two cell clusters that have high expression of ribosomal genes, they also express relatively high expression of cancer markers, in addition, there is a third cell cluster having a high expression of HBB and HBD, but also express cancer markers. Could these all indicate that it is possible these three clusters may come from ambient RNA contamination?
It's hard to intuitively tell if these clusters are an 'artifact' or if indeed they are genuine clusters. Generally, it would be rare for ambient RNA to give you distinct clusters, it mostly just contaminates everything, which ends up obscuring a clusters more ‘interesting’ features, or dampening them down, which makes them more difficult to detect when running differential expression testing when you do FindAllMarkers (or similar).
Since these are your samples, I would suggest you attempt SoupX and see how the resulting clustering looks. If you’re lucky the ambient RNA genes will adjust and you’ll still keep your 3 interesting clusters. Even if you don’t follow through with that analysis it will most definitely be something reviewers ask for since these are human tumour samples, which tend to be of ‘lower’ quality due to how they are extracted. So, better be informed early on about how your samples look.
P.S if the above answered your question, please upvote it 😊
Did it! I'll try SoupX and see where it leads! Thanks!