I'm trying to parse .h5 files containing scRNAseq data from this GEO entry https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE148073
However, I'm not sure how I can access the count matrix, sparse or dense.
From using with h5py.File(testpath, 'r') as f:
I can only find these headers in the h5 files:
['barcode', 'barcode_corrected_reads', 'conf_mapped_uniq_read_pos', 'gem_group', 'gene', 'gene_ids', 'gene_names', 'genome', 'genome_ids', 'metrics', 'nonconf_mapped_reads', 'reads', 'umi', 'umi_corrected_reads', 'unmapped_reads']
But it's not immediately obvious to me by their shapes how I can extract any kind of read matrix from this data. I'm expecting about 3000 cells per sample (and each sample is one .h5 file).
<HDF5 dataset "barcode": shape (33395910,), type "<u8">
<HDF5 dataset "barcode_corrected_reads": shape (33395910,), type "<u4">
<HDF5 dataset "conf_mapped_uniq_read_pos": shape (33395910,), type "<u4">
<HDF5 dataset "gem_group": shape (33395910,), type "<u2">
<HDF5 dataset "gene": shape (33395910,), type "<u4">
<HDF5 dataset "gene_ids": shape (33694,), type "|S15">
<HDF5 dataset "gene_names": shape (33694,), type "|S19">
<HDF5 dataset "genome": shape (33395910,), type "|u1">
<HDF5 dataset "genome_ids": shape (1,), type "|S6">
<HDF5 group "/metrics" (0 members)>
<HDF5 dataset "nonconf_mapped_reads": shape (33395910,), type "<u4">
<HDF5 dataset "reads": shape (33395910,), type "<u4">
<HDF5 dataset "umi": shape (33395910,), type "<u4">
<HDF5 dataset "umi_corrected_reads": shape (33395910,), type "<u4">
<HDF5 dataset "unmapped_reads": shape (33395910,), type "<u4">
Any help would be really appreciated. Thank you!
Radu Tanasa
Thank you for the speedy reply!
I did but it throws an error - according to the docs this reads in 10x-Genomics-formatted hdf5 files. I'm not sure if the .h5 files I have here are formatted the way that I see most hdf5 files being formatted in- it was definitely created using cell ranger, but doesn't contain a 'matrix' path as specified in https://www.10xgenomics.com/support/software/cell-ranger/latest/analysis/outputs/cr-outputs-h5-matrices