Question

Using filtered_feature_bc_matrix for scRNAseq samples

0

Entering edit mode

3.9 years ago

theodore.killian ▴ 30

I have 25 single cell RNAseq 10X samples processed with CellRanger. Three of these samples appear to be contaminated with high levels of low gene cells which are probably "soup" or loose mRNA. I want to filter out these low quality cells, but I don't want to also remove low expressing cells that I am interested in by raising the nFeature threshold too high.

Another bioinformatician in my lab suggested that I use the CellRanger filtered_feature_bc_matrix for these contaminated samples, rather than the raw_feature_bc_matrix. Is it advisable to use filtered_feature_bc_matrix for some of the samples and raw_feature_bc_matrix for the rest? Or should I use entirely one or the other?

rna-seq single-cell • 4.4k views

ADD COMMENT • link updated 3.9 years ago by rpolicastro 13k • written 3.9 years ago by theodore.killian ▴ 30

1

Entering edit mode

I cannot really speak based on the Cellranger vocabulary since I use a different preprocessing pipeline, but I personnally filter based on four categories using the count matrix that the preprocessing pipeline sees as the "filtered cells" so the one where unreliable barcodes have already been removed.

1) Total number of detected genes (detected = counts > 0 per cell)

2) Total UMI counts per cell

3) % reads pmapping to mt genes

4) % genes mapping to rRNA genes

If you plot each of these categories as a violin for all cells you will be able to define by-eye cutoffs (or automated using 2-3xMAD) that separate outlier cells per category which you can then flag. Removing cells which fails any of these categories is usually sufficient to get rid of crappy cells without being overly stringent in any individual category. The combination of filters compensates for being very strict in each category. If you have low numbers of detected genes this usually comes along with low(er) UMI counts.

ADD REPLY • link 3.9 years ago by ATpoint 85k

score 0 · Answer 1 · 2021-01-03

In 10X most droplets do not contain cells. The filtered_feature_bc_matrix file filters out the droplets that likely don't contain a cell, so it's usually the recommended starting point. This is in contrast to raw_feature_bc_matrix which keeps information from all droplets that have a valid barcode (most of which as stated above will contain no cell).

Refer to @ATpoint's comment for advice on further processing after loading the 10X data into Seurat.