I am using SoupX to quantify, profile, and remove ambient mRNA contamination from my scRNA-seq data.
I've tried running the automated workflow on my dataset and am getting the below message.
sc = load10X("/path/to/output")
sc = autoEstCont(sc)
# 127 genes passed tf-idf cut-off and 35 soup quantile filter. Taking the top 35.
# Using 252 independent estimates of rho.
# Estimated global rho of 0.75
# Error in setContaminationFraction(sc, contEst, forceAccept = forceAccept) :
# Extremely high contamination estimated (0.75). This likely represents a failure in estimating the contamination fraction. Set forceAccept=TRUE to proceed with this value.
I'm very new to bioinformatics and scRNA-seq analysis and am wondering how to proceed. What should I do to check if this is "real" before moving on and correcting expression profile.
I've been trying to do some of the visual sanity checks such as mentioned in the vignette but it seems I first need to do the "manual method" to estimate the contamination fraction. However after reading through the vignette several times I'm still confused on the exact code I need to run. I keep running into error "'x' must be an array of at least two dimensions".
Well the vignette is indeed not very explicit where to source the list of genes that should not under any circumstances be expressed in your cells, but it hints on it. Essentially you need an input from your wetlab at this point telling you what kind of cells were sequenced. For instance, if my wetlab team tells me they sequenced pulmonary cell line, I know that immune cells couldn't be there by design, so I will include all immune-cell-related genes like Ptprc, Igkc as contaminants
Also, do you provide the raw and filtered bc matrices? It won't work with only filtered counts