Reading files from different types as H5AD
1
0
Entering edit mode
6 months ago
JACKY ▴ 170

I want to download scRNA-seq counts data from GEO, eventually I want to have a H5AD file. Usually for counts data, I download three files: barcodes, features, and matrix.

However, some datasets in GEO provide different kind of files which are TSV or TXT files. For example, this provides counts along with some info about the cells, or this it provides only counts TXT files for the first 5 samples (which are the scRNA-seq I want). Using scanpy, can I form H5AD files from these ?

Thanks!

python scanpy anndata single-cell • 539 views
ADD COMMENT
1
Entering edit mode
6 months ago
bk11 ★ 3.0k

I am showing here how to save GSM6506112_SP3 sample count matrix from GEO (GSE211956) into h5ad.

import pandas as pd
import scanpy as sc
import os

results_file = "GSE211956_RAW/GSM6506112_SP3.h5ad"

os.listdir("GSE211956_RAW/")
['GSM6506112_SP3_features.tsv.gz',
 'GSM6506112_SP3_barcodes.tsv.gz',
 'GSM6506112_SP3_matrix.mtx.gz']

adata=sc.read_mtx('GSE211956_RAW/GSM6506112_SP3_matrix.mtx.gz')
adata_bc=pd.read_csv('GSE211956_RAW/GSM6506112_SP3_barcodes.tsv.gz', header=None)
adata_features=pd.read_csv('GSE211956_RAW/GSM6506112_SP3_features.tsv.gz',header=None, sep='\t')
adata= adata.T

adata_features
    0   1   2
0   ENSG00000243485 MIR1302-2HG Gene Expression
1   ENSG00000237613 FAM138A Gene Expression
2   ENSG00000186092 OR4F5   Gene Expression
3   ENSG00000238009 AL627309.1  Gene Expression
4   ENSG00000239945 AL627309.3  Gene Expression
... ... ... ...
33533   ENSG00000277856 AC233755.2  Gene Expression
33534   ENSG00000275063 AC233755.1  Gene Expression
33535   ENSG00000271254 AC240274.1  Gene Expression
33536   ENSG00000277475 AC213203.1  Gene Expression
33537   ENSG00000268674 FAM231C Gene Expression
33538 rows × 3 columns

adata.var['gene_id']= adata_features[1].tolist()

adata
AnnData object with n_obs × n_vars = 1504 × 33538
    var: 'gene_id'

adata.write_h5ad(results_file)
os.listdir("GSE211956_RAW/")
['GSM6506112_SP3_features.tsv.gz',
 'GSM6506112_SP3.h5ad',
 'GSM6506112_SP3_barcodes.tsv.gz',
 'GSM6506112_SP3_matrix.mtx.gz']
ADD COMMENT

Login before adding your answer.

Traffic: 2558 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6