Question

Single-cell ambient RNA correction: SoupX vs decontX contamination fraction

2

Entering edit mode

10 months ago

txema.heredia ▴ 210

Hi,

I am analyzing two scRNA-seq samples. When running the ambientRNA removal step, I tested two tools: SoupX and DecontX. However, I am getting very different results in the prediction of ambientRNA fraction in some cells between both softwares. I know both are sensitive to the clustering information provided as input, so I run both using the naive clustering from a preliminary run of Seurat with 19 clusters.

SoupX predicts a mean contamination (rho) of 0.01 per cell, that later on changes slightly on a cell-by-cell basis

summary(df_contamination$soupX_contamination)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.002320 0.009352 0.011108 0.010817 0.012374 0.037122

However, decontX predicts a much wider range of contamination, with certain cells reaching 95%:

summary(df_contamination$decontX_contaminationSeurat)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0005773 0.0258687 0.0560367 0.0977059 0.1146198 0.9553540

And this is the comparison between methods:

soupX vs decontX all clusters:

image: soupX vs decontX all clusters

However, when looking cluster-by-cluster, I see that some of them have much higher ambient prediction than others:

soupX vs decontX clusters 0 to 11:

image: soupX vs decontX clusters 0 to 11

soupX vs decontX clusters 12 to 18:

image: soupX vs decontX clusters 12 to 18

I know that both methods use over/under-representation of markers in the soup vs markers in the cluster to ascertain which genes belong to which fraction.

However, my biggest issue with all this is that this tissue is aorta aneurysm. We expect that the cell dissociation process will be much harsher in some cell types than others. Probably on those we are most interested in. This might lead to the soup composition to be overrepresented in those genes, and overcorrecting for them.

Still, decontX seems to be able to filter out celltype-specific markers derived from the literature in different clusters:

decontX celltype markers:

image: decontX celltype markers

Which of the two methods is more reliable? Which would work better in this case? Should I simply skip the ambientRNA detection step?

single-cell ambient-RNA • 3.0k views

ADD COMMENT • link updated 10 months ago by fracarb8 ★ 1.7k • written 10 months ago by txema.heredia ▴ 210

0

Entering edit mode

Very good question and analysis. I personally only tried SoupX so can't weigh in on the comparison. This might be not really the issue here but I suggest you check your clustering. I don't have any input about the importance of good clustering for a proper ambient mRNA decontamination, maybe someone else can weigh in on that, but I know that when trying to interpret scRNAseq data naïve clustering with Seurat without any optimal clustering resolution consideration can be fraught with peril. I tried the clustree but it's very clunky and non-transparent. I personally switched to cNMF as the clustering tool and so far it was giving me quite nice results... It just works

ADD REPLY • link 10 months ago by e.r.zakiev ▴ 250

0

Entering edit mode

The point is that you need to feed the ambient algorithm a clustering list as input. This way, it checks for genes present in the soup vs genes highly expressed in the cluster. And those genes are kept (as they should belong to the cell fraction in that population) while the rest of the ambient genes are supposed to belong to the soup contamination and are "removed" from that cluster.

I suppose that this helps when the soup is composed mainly by a subset of cell types, so they do not remove the signal from the actual cells from that type.

But none of this explains the wild differences in ambient % estimation between both methods, when even the clustering info is identical.

ADD REPLY • link 10 months ago by txema.heredia ▴ 210

score 0 · Answer 1 · 2024-05-24

0

Entering edit mode

10 months ago

txema.heredia ▴ 210

Update:

I ran CellBender on these samples and compared its results with SoupX and DecontX.

vs SoupX:

cellbender vs SoupX

vs DecontX

enter image description here

As you can see, the results with this method are, again, widely different.

Do you have any suggestion on which method is the most reliable? Should I go with CellBender just because it is the most recent one?