Seeking Advice on Pre-processing Single Cell Data for Gene Mapping
1
1
Entering edit mode
9 months ago
monklod ▴ 10

Dear Bioinformaticians,

I am struggling with my single cell data analysis as I have never worked with single cell data before and I'm in need of some guidance regarding pre-processing steps before mapping against two reference genes.

Here's a brief overview of my scenario: I have single-cell data at hand, but lack of clear steps of pre-mapping. Experiment-wise, previous FISH analysis, indicate non-uniform gene expression of two genes I am looking into. My plan is to map single cell reads and subsequently calculate their relative expression. However, before diving into mapping, I need to assess data quality and filter out any subpar reads/cells.

Here's where I need your expertise:

  • Quality Assessment Pipeline: What steps or pipelines do you recommend for evaluating the quality of single-cell data? Are there specific metrics or tools?
  • Filtering Criteria: How can I effectively identify and filter out low-quality reads and cells?

This is what came up with until now: Quality Control (QC): I have basic stats of the run and library preparation from 10x. I'd would also like to go ahead with the following (it is more a pseudocode for now, but I'd appreciate the insight):

# Load single-cell count matrix 
counts <- readMM("single_cell_counts.mtx")

# Create SingleCellExperiment object
sce <- SingleCellExperiment(assays = list(counts = counts))

# Quality control analysis
sce <- calculateQCMetrics(sce)
sce <- sce[rowData(sce)$detected_genes >= 200, ] # Filter cells with fewer than 200 detected genes
sce <- sce[rowData(sce)$pct_counts_mt < 20, ] # Filter cells with mitochondrial gene expression < 20%

# Normalization
sce <- logNormCounts(sce)

# Gene filtering
sce <- sce[rowData(sce)$mean_counts >= 0.1, ] # Filter genes with mean expression < 0.1
sce <- sce[rowData(sce)$dispersion_empirical <= 0.5, ] # Filter genes with high dispersion

# Cell filtering
sce <- sce[, colSums(counts(sce)) >= 1000] # Filter cells with fewer than 1000 counts
sce <- sce[rowData(sce)$pct_counts_mt < 20, ] # Filter cells with mitochondrial gene expression < 20%

Then, I would like to map the filtered reads against my reference. Here I am also unsure in what format the filtered reads come and how to distinguish between different UMIs (to identify the mapped reads coming from the same cell).

Any insights, suggestions, or recommended resources? Thank you in advance!

Monika

single analysis cell preprocessing • 378 views
ADD COMMENT
0
Entering edit mode
9 months ago
ATpoint 86k

These are all very common and standard steps, outlined for example in https://bioconductor.org/books/release/OSCA/

Note that "preprocessing" and "mapping" are terms commonly used for the initial alignment of the data, e.g. with software such as CellRanger. Lingo is important if you talk to people in the field.

ADD COMMENT

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6