Question

How to save to CSV file scRNA-seq datasets obtained by R-package "scRNAseq"

0

Entering edit mode

2.8 years ago

Alexander ▴ 220

There is R-package with collections of scRNA-seq datasets: https://bioconductor.org/packages/release/data/experiment/vignettes/scRNAseq/inst/doc/scRNAseq.html

I am Python user and know nothing about R, but want to use some of that datasets. So I just want to load it and save to CSV, then I will use Python.

Question: After I install that package and got the dataset - how to save it to csv ? (I mean, count matrix, gene names , and cells Ids).

Here is example how to load: https://www.kaggle.com/code/alexandervc/try-rpackage-scrnaseq But what to do next ?

scRNA-seq • 3.2k views

ADD COMMENT • link updated 2.8 years ago by rpolicastro 13k • written 2.8 years ago by Alexander ▴ 220

0

Entering edit mode

2.8 years ago

Alexander ▴ 220

Thanks to rpolicastro for an excellent answer ! Collegues also suggested me yet another way, let me also save it here:

sce <- MessmerESCData() # KolodziejczykESCData() #  LaMannoBrainData('human-es')
d1 = as.data.frame(sce@assays@data[[1]])
write.csv(d1, "MessmerESCData.csv")

Remark: It saves to csv: counts, genes names, and samples ids.

ADD COMMENT • link 2.8 years ago by Alexander ▴ 220

1

Entering edit mode

It's possible, but imho not recommended for three reasons. 1) You should use getter functions rather than directly accessing the slots in a SingleCellExperiment (or any container format), here that would be counts() or assay() as suggested above. Reason is that if the structure of the format changes in future versions of the package your manual accession of the data might not work anymore while the getter function will always work as the developer simply changes the internal code of the getter but for the end user it stays the same. 2) the as.data.frame expands the compressed/sparse matrix of counts to an ordinary one, and for large datasets that might blow up your memory. In any case, it uses unncessarily much memory. For this simple example here it's fine what you do, but in general using getters and leaving sparse/compressed data as such is imho a good practice to get used to, which is what the code snipped of rpolicastro does. 3) A plain csv is larger than the compressed mtx format which saves disk space, though both can be gzipped to further reduce size.

ADD REPLY • link 2.8 years ago by ATpoint 87k

rpolicastro · Accepted Answer · 2022-04-30

3

Entering edit mode

2.8 years ago

rpolicastro 13k

It loads them in a SingleCellExperiment object. From here you have a few options.

1) You can save the entire object as an h5ad file using zellkonverter, which can be opened directly using scanpy in python. I recommend this method because it will also transfer all of the feature and sample metadata.

library("zellkonverter")

writeH5AD(sce, "sce.h5ad")

2) You can save the data in 10X cell ranger format.

library("readr")
library("Matrix")

write_tsv(rownames(sce), "features.tsv.gz")
write_tsv(colnames(sce), "barcodes.tsv.gz")
writeMM(assay(sce), "matrix.mtx")

3) The last way is exporting the entire matrix as a csv which I don't usually do or recommend.

library("readr")

assay(sce) |> as.matrix() |> as_tibble(rownames="features") |> write_csv("matrix.csv.gz")

ADD COMMENT • link 2.8 years ago by rpolicastro 13k

0

Entering edit mode

Thank you very much for your answer !

For the third option I got an error message: "Error in parse(text = x, srcfile = src): <text>:4:13: unexpected '>' 3: 4: assay(sce) |>"

Would you mind to comment, please ?

ADD REPLY • link 2.8 years ago by Alexander ▴ 220

2

Entering edit mode

|> is the native pipe in R but only available in versions 4+. You could use the pipe in magrittr %>% or rewrite the command using nested functions instead of the pipe.

ADD REPLY • link updated 2.8 years ago by rpolicastro 13k • written 2.8 years ago by Trivas ★ 1.9k