Entering edit mode
4.2 years ago
asumani
▴
70
Hi all,
I have scRNA-seq data prepared with Smart-seq2 protocol. I have been using HISAT-StringTie-Ballgown(Partea et al. 2016) for my bulk RNAseq data.
For scRNAseq I have aligned& assembled transcripts with HISAT-StringTie. Yet, I believe normalization and further processing of transripts is different for scRNA. Any suggestions for how to process that data?
Best wishes,
Asuman
Is this a model organism that you work with? I would not assemble transcripts from sparse single-cell data, there is little that you can actually gain given you have a reference transcriptome.
It is human B-cells. There are 88 samples subsetted from 973 cells sequenced in initial data(publicly available data). It is complex study design, I took only the cells(B cell isotype of IgE) I am interested in.
Could you explain a bit more of your comment?
Well, transcript assembly (like every assembly) requires good coverage to be reliable. Single-cell data are inherently sparse per cell, and if this is human for which excellent reference genome/transcriptome exists, then there is no reason imho to even bother with transcriptome assembly. I personally would use
salmon
to quantify the reads against a reference transcriptome, but you can also use something likefeatureCounts
to get your count matrix. From there on you should follow guided tutorials, e.g. https://osca.bioconductor.org/ or the Seurat vignette (but I strongly encourage OSCA). So you have 88 cells now, what exactly do you want to compare?I have no idea why my reply is not saved here.. You can find it below.
Thanks a lot!
I misused the word 'assembly', I am sorry. I have already 'merged' transcripts using StrinTie and created count matrix. For bulk RNA, I normally use Ballgown for differential expression analysis which is an R package used in third step of this estanblished pipeline. So my concern is if I should switch to scRNAseq specific tool for normalization/differential expression steps.
20 A-type, 68 B-type cells are present in total of 88 cells. I am looking at DE between these two groups. Hope you can give additonal tips regarding statistics/tools after that information.
I'd second the recommendation of the OSCA book @ATpoint linked, it has a solid explanation of differential expression in single cell data and how pseudobulk replicates can be utilized to reap the benefits both of the resolution of single-cell data and the robustness of bulk RNA-seq methods to compare between conditions (Chapter 14). But in this case, basic marker finding is probably all you want/need, but it also has a chapter on that (Chapter 11).