Question

h5ad cellxgene to R

1

Entering edit mode

2.6 years ago

firestar ★ 1.7k

I am trying to bring this small dataset (1496 cells, 43MB) from cellxgene.

1 R SeuratDisk

h5ad to h5seurat and then to seurat.

library(SeuratDisk)
SeuratDisk::Convert("local.h5ad", dest = "local.h5seurat", overwrite=TRUE)
g <- SeuratDisk::LoadH5Seurat("local.h5seurat", meta.data=FALSE, misc=FALSE)

 Validating h5Seurat file
 Initializing RNA with data
 Adding counts for RNA
 Adding feature-level metadata for RNA
 Error: Missing required datasets 'levels' and 'values'

or another error

Validating h5Seurat file
Initializing RNA with data
Error in if ((lp <- length(p)) < 1 || p[1] != 0 || any((dp <- p[-1] -  : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
In sparseMatrix(i = x[["indices"]][] + 1, p = x[["indptr"]][], x = x[["data"]][],  :
  NAs introduced by coercion to integer range

Converts to h5seurat, but runs into error when reading in the h5seurat file.

2 R anndata

library(anndata)
g <- read_h5ad("local.h5ad") # crashes rstudio locally

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  anndata._io.utils.AnnDataReadError: Above error raised while reading key '/layers' of type <class 'h5py._hl.group.Group'> #from /.

Fails for whatever reason.

3 R SCP

h5ad to single-cell-experiment.

# remotes::install_github("zhanghao-njmu/SCP",upgrade="never")
library(SCP)
library(reticulate)
sc <- import("scanpy") # crashes rstudio locally
adata <- sc$read_h5ad("local.h5ad")
srt <- adata_to_srt(adata)

R crashes at line 3. Something to do with reticulate I assume.

4 R Zellkonverter

# remotes::install_github("theislab/zellkonverter",upgrade="never")
library(zellkonverter)
g  <- readH5AD("local.h5ad", verbose = TRUE, reader = "python")

ℹ Using the Python reader
ℹ Using anndata version 0.8.0
sh: 5: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/udunits2-deactivate.sh: [[: not found
sh: 5: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/geotiff-deactivate.sh: [[: not found
sh: 5: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/gdal-deactivate.sh: [[: not found
sh: 11: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/gdal-deactivate.sh: [[: not found
sh: 4: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/deactivate-r-base.sh: [[: not found
sh: 5: /home/roy/miniconda3/envs/r-4.1/etc/conda/deactivate.d/deactivate-gxx_linux-64.sh: Syntax error: "(" unexpected
Warning message:
In system(paste(act.cmd, collapse = " "), intern = TRUE) :
  running command '. '/home/roy/.cache/R/basilisk/1.4.0/0/etc/profile.d/conda.sh' && conda activate && /home/roy/miniconda3/envs/r-4.1/lib/R/bin/Rscript --no-save --no-restore --no-site-file --no-init-file --default-packages=NULL -e "con <- socketConnection(port=11303, open='wb', blocking=TRUE);serialize(Sys.getenv(), con);close(con)"' had status 2

Fails when using reader as Python. Conda issue?

library(zellkonverter)
g  <- readH5AD("local.h5ad", verbose = TRUE, reader = "R")

ℹ Using the R reader
✔ Reading local.h5ad [3.9s]
Warning message:
In value[[3L]](cond) :
  setting 'colData' failed for 'local.h5ad': cannot coerce class
  "list" to a DataFrame

Fails when using reader as R.

5 Python scanpy

Tried scanpy in python. I don't know any python, so just tried two lines of basic code.

import scanpy as sc
g = sc.read_h5ad("local.h5ad")

Traceback (most recent call last):
  File "/home/roy/miniconda3/lib/python3.8/site-packages/anndata/_io/utils.py", line 156, in func_wrapper
    return func(elem, *args, **kwargs)
  File "/home/roy/miniconda3/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 510, in read_group
    EncodingVersions[encoding_type].check(
  File "/home/roy/miniconda3/lib/python3.8/enum.py", line 387, in __getitem__
    return cls._member_map_[name]
KeyError: 'dict'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/roy/miniconda3/lib/python3.8/site-packages/anndata/_io/h5ad.py", line 413, in read_h5ad
    d[k] = read_attribute(f[k])
  File "/home/roy/miniconda3/lib/python3.8/functools.py", line 875, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/roy/miniconda3/lib/python3.8/site-packages/anndata/_io/utils.py", line 162, in func_wrapper
    raise AnnDataReadError(
anndata._io.utils.AnnDataReadError: Above error raised while reading key '/layers' of type <class 'h5py._hl.group.Group'> from /.

Not really sure if this related to the input file or just a lot of random issues. How does one import cellxgene dataset to R? Any other tools/solution? Maybe a web app or a docker container for this? All I want is to just get the raw counts and metadata out.

Using R 4.0.1 and/or R 4.1.1. Python 3.8.13.

single-cell h5 cellxgene anndata R • 8.5k views

ADD COMMENT • link 2.1 years ago by firestar ★ 1.7k

0

Entering edit mode

If there is no specific reason for trying to download the h5ad file, downloading the rds file is rather straightforward and should help you to access raw counts and metadata.

ADD REPLY • link 2.6 years ago by Erdogan • 0

0

Entering edit mode

Large datasets do not have the Rds download as an option.

enter image description here

ADD REPLY • link 2.5 years ago by firestar ★ 1.7k

0

Entering edit mode

I am trying use the same data and same method. But when I read the csv expression matrix in R, it frozen. Could you share your final seurat.Rds file. That would help many people. Thanks.

ADD REPLY • link 2.1 years ago by abmmki • 0

GenoMax · Accepted Answer · 2023-01-15

I finally managed to make it work by exporting from scanpy as a csv file.

Then run this python script assuming the input file is file.h5ad. This exports raw-counts.csv and metadata.csv files.

import scanpy as sc
import numpy as np
import pandas as pd

print(sc.__version__)
print(np.__version__)
print(pd.__version__)

print("Reading data...")
adata = sc.read_h5ad("file.h5ad")

print("Writing data...")
t=adata.raw.X.toarray()
pd.DataFrame(data=t, index=adata.obs_names, columns=adata.raw.var_names).to_csv('raw-counts.csv')
# comma separated csv with header, no rownames
pd.DataFrame(adata.obs).to_csv("metadata.csv")

This script is used to read the counts and metadata into R and create a seurat object.

library(Seurat)

message("Reading counts...")
x <- read.csv("raw-counts.csv",header=TRUE)
rownames(x) <- x[,1]
x[,1] <- NULL
print(dim(x))
print(x[1:5,1:5])

message("Reading metadata...")
m <- read.csv("metadata.csv",header=TRUE)
rownames(m) <- m[,1]
colnames(m)[1] <- "sample"
print(dim(m))
print(head(m))

message("Writing seurat object...")
saveRDS(
  CreateSeuratObject(counts=t(x),meta.data=m,project="seurat",min.cells=0,min.features=0),
  "seurat.Rds"
)

My issues with zellkonverter could be something to do with the fact that the dataset was too large. To give an idea of the resources needed. A cellxgene dataset with 1,135,677 cells with 11GB h5ad file size used about 100GB of RAM to export a 29GB csv file in 2 hours. Reading it into R and exporting as a seurat object took about 192GB of RAM and about an hour.