How to save to CSV file scRNA-seq datasets obtained by R-package "scRNAseq"
2
0
Entering edit mode
2.6 years ago
Alexander ▴ 220

There is R-package with collections of scRNA-seq datasets: https://bioconductor.org/packages/release/data/experiment/vignettes/scRNAseq/inst/doc/scRNAseq.html

I am Python user and know nothing about R, but want to use some of that datasets. So I just want to load it and save to CSV, then I will use Python.

Question: After I install that package and got the dataset - how to save it to csv ? (I mean, count matrix, gene names , and cells Ids).

Here is example how to load: https://www.kaggle.com/code/alexandervc/try-rpackage-scrnaseq But what to do next ?

scRNA-seq • 3.0k views
ADD COMMENT
3
Entering edit mode
2.6 years ago

It loads them in a SingleCellExperiment object. From here you have a few options.

1) You can save the entire object as an h5ad file using zellkonverter, which can be opened directly using scanpy in python. I recommend this method because it will also transfer all of the feature and sample metadata.

library("zellkonverter")

writeH5AD(sce, "sce.h5ad")

2) You can save the data in 10X cell ranger format.

library("readr")
library("Matrix")

write_tsv(rownames(sce), "features.tsv.gz")
write_tsv(colnames(sce), "barcodes.tsv.gz")
writeMM(assay(sce), "matrix.mtx")

3) The last way is exporting the entire matrix as a csv which I don't usually do or recommend.

library("readr")

assay(sce) |> as.matrix() |> as_tibble(rownames="features") |> write_csv("matrix.csv.gz")
ADD COMMENT
0
Entering edit mode

Thank you very much for your answer !

For the third option I got an error message: "Error in parse(text = x, srcfile = src): <text>:4:13: unexpected '>' 3: 4: assay(sce) |>"

Would you mind to comment, please ?

ADD REPLY
2
Entering edit mode

|> is the native pipe in R but only available in versions 4+. You could use the pipe in magrittr %>% or rewrite the command using nested functions instead of the pipe.

ADD REPLY
0
Entering edit mode
2.6 years ago
Alexander ▴ 220

Thanks to rpolicastro for an excellent answer ! Collegues also suggested me yet another way, let me also save it here:

sce <- MessmerESCData() # KolodziejczykESCData() #  LaMannoBrainData('human-es')
d1 = as.data.frame(sce@assays@data[[1]])
write.csv(d1, "MessmerESCData.csv")

Remark: It saves to csv: counts, genes names, and samples ids.

ADD COMMENT
1
Entering edit mode

It's possible, but imho not recommended for three reasons. 1) You should use getter functions rather than directly accessing the slots in a SingleCellExperiment (or any container format), here that would be counts() or assay() as suggested above. Reason is that if the structure of the format changes in future versions of the package your manual accession of the data might not work anymore while the getter function will always work as the developer simply changes the internal code of the getter but for the end user it stays the same. 2) the as.data.frame expands the compressed/sparse matrix of counts to an ordinary one, and for large datasets that might blow up your memory. In any case, it uses unncessarily much memory. For this simple example here it's fine what you do, but in general using getters and leaving sparse/compressed data as such is imho a good practice to get used to, which is what the code snipped of rpolicastro does. 3) A plain csv is larger than the compressed mtx format which saves disk space, though both can be gzipped to further reduce size.

ADD REPLY

Login before adding your answer.

Traffic: 1378 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6