Question

Producing Bulk samples from 10X data

0

Entering edit mode

21 months ago

rohitsatyam102 ▴ 940

Hi Everyone

I am aware about an approach that's called pseudobulking in single cell where bulk-like samples are generated from scRNAseq data (in absence of bulk data) to find which genes might be important at population level. But there is something my boss asked and I am not sure if that's a correct way to generate bulks.

I was asked to sample 60% of total reads from fastqs of 10X data (UMI data 3' chemistry) to generate three replicates per sample and then align them to plasmodium reference and use DESeq2 for DE analysis and check the overlap of DEG's with DEG's obtained from scrnaseq (all clusters combined). Now I did what was asked of me and I get the ideal biological replicates. But the dispersion estimate looks weird (I understand there will be no dispersion given that biological replicates are almost identical). I observe that nearly 66% of the genes detected are differentially expressed. Besides, out of total scrnaseq DEGs, 60% of them overlaps with these artificial bulk derived DEGs. So is this good.

I am confused if what I have been asked for is even legit or not?

enter image description here

rnaseq scrnaseq deseq2 bulk seurat • 815 views

ADD COMMENT • link updated 21 months ago by ATpoint 88k • written 21 months ago by rohitsatyam102 ▴ 940

score 2 · Answer 1 · 2023-09-10

The dispersion plot, as you say, is expected as you are creating pseudoreplication. The way paeudobulks are typically created is based on the count matrix. You sum raw counts per cluster, celltype, group, whatever makes sense. This pseudoreplication you create makes no sense to me. If you don't have replication you cannot make it up.