Question

Why do some genes are more prone to dropout effect in scRNA-seq?

1

Entering edit mode

4.0 years ago

Ergün ▴ 20

Dear Community,

scRNA-seq analysis has its own conventions. For example, CD56 is a canonical protein marker for the identification of Natural Killer (NK) cells. However, in scRNA-seq analysis, NCAM1 (gene for CD56) is not commonly used. Instead, GNLY and NKG7 are popular markers for identification of NK cell cluster in t-SNE or UMAP.

To address this issue, I have investigated several scRNA-seq datasets from different technologies such as 10X Chromium (v1, v2, v3) and Smart-seq2. It turns out that NCAM1 expression is very low in even FACS-gated (as CD56+) NK cells while GNLY and NKG7 are detected very well in this group.

My question is that why do some genes are more prone to dropout effect? What are the biological or technical explanations behind this issue?

Thank you for your contributions in advance.

scRNA-seq dropout • 2.5k views

ADD COMMENT • link updated 4.0 years ago by Friederike 9.0k • written 4.0 years ago by Ergün ▴ 20

score 7 · Accepted Answer · 2021-12-01

The biggest factor is the amount of mature transcripts in a given cell. There may be some genes that are very quickly translated and/or encode very stable proteins, which would lead to relatively low numbers of free mature transcripts despite relatively high protein levels. Some genes are also transcribed in short bursts whereas others are more continuously transcribed, which would increase their chance of being present in the majority of cells at a given time point.

GC content may also play a role (very high or very low GC content will negatively impact PCR efficiency).

I can also imagine that some transcripts might be less amenable to poly-A-based capture, maybe due to secondary structures or additional interacting factors.

EDIT: This link provides a good run-down of the technical aspects limiting gene capture rates in general: https://www.quora.com/What-causes-genes-to-drop-out-of-single-cell-RNAseq-data-To-what-extent-can-they-be-recovered-by-sequencing-more-deeply?share=1

In short, the main reasons for drop-outs of individual genes in individual cells are due to (a) transcript abundance, (b) capture inefficiency, (c) amplification/sequencing bias.