The project "Tabula Sapiens" ( https://tabula-sapiens-portal.ds.czbiohub.org/ ) provides scRNA-seq data for huge number of single cells transcripts ( nearly 500,000 cells from 24 organs of 15 normal human subjects ). One can download count matrices of scRNA-seq data here: https://figshare.com/articles/dataset/Tabula_Sapiens_release_1_0/14267219
I took a brief look on data (https://www.kaggle.com/alexandervc/look-on-tablasapiens-bone-marrow) and a bit puzzled by the following:
Question: What is the format for these count matrices - counts, log( 1+ counts ) or probably something else ?
The values are e.g. 1.6892005, 1.7121489
if I take summation by genes or summation of 2**( these expressions) or summation exp(these expressions) : neither of these correspond to 'n_counts_UMIs' - provided by separate column. So that seems to me that it is not neither counts, neither log(counts).
Thank you ! As far as I understand publication is NOT yet available. I am missing mentioning scanpy - where do you see it ? Any way scanpy has several versions of normalizations .
You find the reference to
scnpy
at the bottom of the page, below [Tabula Sapiens on figshare]1, That's exactly why you need to check how the processed the data. If you cannot find on the portal, you should contact them via email, and ask.It is written "to use with scanpy " not "normalized with scanpy". Any way thank you.