Using the Seurat pipeline for demultiplexing HTOs, I am analysing a single cell proteogenomics data set of 15239 cells where 4148 cells are doublets, 4921 are singlets and 6170 are negative. Since the number of singlets was so low, I decided to include the negative cells in the analyses. Even though the negative cells have fewer number of transcripts and genes than the singlets (first two figures) I still see that some of the cells cluster together with various singlet subpopulations (third figure). According to the paper, negative cells reflect ambient RNA mixtures that may blend multiple subpopulations. Could it be wrong to include the negative cells in the analyses?
NB: playing around with the positive.quantile parameter other than the default lead to a more drastic loss of singlets. So I left it at default value.