Question

Handling single-cell RNA-seq data with very few cells and possible conversion to bulk-like RNA-seq

0

Entering edit mode

3.8 years ago

mb86 • 0

I have a dataset (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158442) with very few cells for certain cases after the QC and clustering steps:

enter image description here

I was wondering high reliable it would be to use counts across such few number of cells at case level?

The original authors divide the cases in 4 different stages so they end up with a good number of cells per stage per cell type. However I am more interested in individual cases.

A second question related to the first one is that I don't have much interest in expression values across different cells for the specific set of genes I am interested in as they seem to have similar patterns across all cell types. So my question is what would be a good strategy to convert raw counts across all cells, for each individual case, to a bulk-like RNA-seq format, to able to use cells that are filtered out due to multiplet detection and other purity measures during QC and clustering.

RNA-Seq • 1.3k views

ADD COMMENT • link updated 3.8 years ago by ATpoint 86k • written 3.8 years ago by mb86 • 0

1

Entering edit mode

Scuttle has a function to aggregate counts per cluster (or celltype, or whatever classification you use to define cells belonging to the same group) by summing them up: https://rdrr.io/github/LTLA/scuttle/man/aggregateAcrossCells.html

The advantage is that you can easily use established software such as DESeq2, you get rid of the sparseness of the data and the per-cell dropout events. As "sample_source" seems to be several donors probably means you have biological replicates, therefore standard DE analysis on the pseudobulk level can be done. As each cell has been sequenced individually the differences in sequencing depth of the pseudobulks (when clusters have very different cell numbers) should (in my limited experience) be easy to remove by standard normalization approaches as in DESeq2 or edgeR as the differences are mostly technical, and pseudobulks with lower total depth probably do not have more dropouts (which is the usual concern in bulk RNA-seq when deth is very different, but this probably does not apply here). You might want to exclude pseudobulks with very low cell counts, like below 50 or so I guess. A PCA on the pseudobulk level might help to see whether the bulks with low cell numbers show clustering that looks "suspicious" in terms of being confounded by the low cell numbers. If not then yeah, include it and see whether results make sense. You can always go back and do more filtering. Does that make sense to you?