Question

ValueError when loading expression matrix into scanPy

0

Entering edit mode

3.4 years ago

c2e09af0 • 0

Hello everyone, I am new to bioinformatics and want to build a reference atlas to project my own data on it using scArches and other packages like scanpy. However, I'm having troubles in loading the reference dataset . I downloaded the exprMatrix.tsv.gz file from https://cells-test.gi.ucsc.edu/?ds=early-brain and used the following code to import the data into Python:

   import scanpy as sc
   adata = sc.read_text("exprMatrix.tsv.gz")

I get this error:

ValueError: could not convert string to float: 'NA'

I tried loading the data in R with the Seurat package, which worked after appending one empty line. Can it be that Python and R use different expressions for 'NA' values (NaN?) and therefore Python can not load the file? Can I just replace the 'NA' values with 'NaN' in the file or do they have a different meaning?

I would very much appreciate help. Thank you for taking the time!

RNA-seq Python scArches scanpy scRNAseq • 2.0k views

ADD COMMENT • link 3.4 years ago by c2e09af0 • 0

score 1 · Accepted Answer · 2022-03-29

1

Entering edit mode

3.4 years ago

zorbax ▴ 650

I think the read functions are for NumPy arrays, but you can use pandas to load your file like a data frame and then use AnnData to load it:

import scanpy as sc
import pandas as pd

chunks = pd.read_table("~/path/to/exprMatrix.tsv.gz", index_col=0, chunksize=1000000)
df = pd.concat(chunks)
adata = sc.AnnData(df)