Hi all, I have just started working with single-cell data and I apologise if this question seems nonsensical. I have counts data from this paper (https://doi.org/10.1371/journal.pbio.3001017) and, for example, there are 9 and 7 samples of zygote and oocyte, respectively. My biological question is to look at splicing patterns in these stages which I plan to obtain by passing the counts file through SUPPA. I was wondering if it makes sense to average the expression levels of transcripts in each stage (e.g. zygote) by simple mean()
function in R or is this inappropriate considering it will treat the single-cell data as bulk? Is there a more appropriate way of doing this or it is best not to average expression at all? Any insight would be appreciated. Thanks for your help in advance!
The common way I know in terms of pseudobulking cells is to sum, not average cells. Keep in mind that single-cell data are often 3'-tagged so reliable splicing detection might be difficult.
Thank you for your reply. Sorry to ask this but could you explain how 3' tag can effect splicing detection, please?
Well, if you're only sequencing the very end (the 3' end) of transcripts, how are you going to detect any of the splice junctions that appear in the middle or at the beginning (5' end) of an RNA transcript?
It looks like that paper used nanopore so the full length will be sequenced in which case you can do splicing/isoform detection analysis.
Thank you for this. I went back to do some more reading and this makes a lot of sense!