Question

How to create a Seurat object from the `xxx_metadata.txt.gz ` and `xxx_counts.txt.gz` files?

0

Entering edit mode

22 months ago

Sun • 0

Dear all,

I downloaded two files of scRNA-seq data from GEO, including xxx_metadata.txt.gz and xxx_counts.txt.gz files. I want to load these files and create a Seurat object.

The xxx_metadata.txt.gz file can be read by:

metadata <- read.table(gzfile("xxx_metadata.txt.gz"), header = TRUE, sep = "\t")

However, when use read.table to read the counts file:

counts <- read.table(gzfile("xxx_counts.txt.gz"), header = TRUE, sep = "\t", row.names = 1)

I found that it is extremely slow and consume a large amount of RAM. The "xxx_counts.txt.gz" file is about 2 G, but reading it with read.table does not finish even when 10 G RAM is taken up.

Could you tell me how to create a Seurat object based on these two files? I will apperiate deeply if you could help.

Seurat • 3.9k views

ADD COMMENT • link updated 22 months ago by swbarnes2 14k • written 22 months ago by Sun • 0

0

Entering edit mode

How big are these files? My bed guess is that this is a matrix. If there is more than one sample you would have to deal with that separately, as well as to know if the data is normalized already or it is raw counts. Either way, you can create a Seurat object after you have loaded the matrix into R using

library(Seurat)

seuratObj <- CreateSeuratObject (counts=counts)

Also, you maybe can reduce RAM consumption by using a sparse matrix.

ADD REPLY • link 22 months ago by biofalconch ★ 1.3k

score 0 · Answer 1 · 2023-06-16

0

Entering edit mode

22 months ago

Haci ▴ 730

In order to create a Seurat object, you would need to use one of the functions here that start with Read: https://satijalab.org/seurat/reference/

The function to use will depend on how the data is generated, for example for 10x that would be Read10X().

As for the metadata, you can check AddMetaData()

ADD COMMENT • link 22 months ago by Haci ▴ 730

0

Entering edit mode

Read10X does not work because the barcode.txt.gz file is missing.

ADD REPLY • link 22 months ago by Sun • 0

0

Entering edit mode

@Sun, you are right, I missed the part that you only got a count file. In that sense, I think you can give fread() from the data.table package a try to read the count data. Then you can convert this to a sparse matrix before feeding into CreateSeuratObject() as biofalconch suggested above.

ADD REPLY • link 22 months ago by Haci ▴ 730

score 0 · Answer 2 · 2023-06-16

0

Entering edit mode

22 months ago

swbarnes2 14k

I'd find some sample 10X output files, and make a dummy barcode file. It doesn't really matter what the barcodes are, they are just names for cells, so it doesn't matter if they aren't the "real" barcode sequences.

ADD COMMENT • link 22 months ago by swbarnes2 14k