Create Seurat Object from matrix and two text files
2
0
Entering edit mode
13 months ago
Sky ▴ 10

The data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM5596828) that is available to create a seurat object from Sci-sequencing data is the counts matrix (matrix.gz), cell annotation (txt.gz), and gene annotation (txt.gz). I am having some issue combining these files to create one seurat object. Does anyone have any suggestions?

Single-Cell RNA-Seq Seurat • 1.8k views
ADD COMMENT
0
Entering edit mode

For the normal Seurat workflow those files are usually put in one directory and imported with the Read10X function.

If you go this route you need to rename the three files to barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz as per the normal CellRanger convention.

ADD REPLY
0
Entering edit mode

I did try this and then this error pops up: Error in readMM(file = matrix.loc) : file is not a MatrixMarket file

I then tried to read in the matrix file itself by using readMM() and it says it is not a MatrixMarket file so I think that the matrix.gz file I downloaded directly from GEO is not actually a matrix...

I am actively trying to figure out a solution to this but if you have any more insights, I would welcome them!

ADD REPLY
1
Entering edit mode
13 months ago

In their methods they state it's a matrix market format.

And the count matrix is in MatrixMarket format.

However, the file on GEO is missing the MM header. Usually the first (non %%) line will have the row number, column number, and number of non-zero values. This lets whatever reader that's opening the file know the original dimensions of the matrix. The file on GEO is missing this.

curl 'https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5596nnn/GSM5596828/suppl/GSM5596828%5FSMSG%5F12%5F13w%5Fumi%5Fcounts.matrix.gz' | gzip -dc | head -n5

7553    1       1
15460   2       1
15557   3       1
19314   3       1

You could guess that the numeric values in the first and second columns are the row and column numbers, but it would be better to check with the original depositor of the data since they uploaded a broken file, and knowing the row and column numbers are ciritical for merging the cell barcodes and features into the matrix.

Here's an example of what the header of the file should look like. I'll create an example sparse matrix in R and then save it to a file.

library("Matrix")

set.seed(420)

# Create the example matrix.
mat <- matrix(sample(c(0, 1, 2), 20, replace=TRUE), nrow=5)

> mat
     [,1] [,2] [,3] [,4]
[1,]    0    0    0    0
[2,]    0    0    2    1
[3,]    1    0    0    2
[4,]    1    0    0    1
[5,]    0    0    0    2

# Save it as a sparse matrix MM format.
writeMM(as(mat, "sparseMatrix"), "matrix.mtx")

Now we can check the header of this file.

head -n5 'matrix.mtx'

%%MatrixMarket matrix coordinate integer general
5 4 7
3 1 1
4 1 1
2 3 2
ADD COMMENT
1
Entering edit mode
4 months ago

I have encounter the same problem too and it seems can be solved by code below. It too late for your answer but it can help others.

library(Matirx)
library(Seurat)
mtx <- readMM('your_file.mtx.gz') 
mtx[1:4,1:4] 
dim(mtx) 
cl <- fread('your_file.barcodes.txt.gz', header = F, data.table = F ) 
head(cl) 
rl <- fread('your_file.genes.txt.gz', header = F, data.table = F ) 
head(rl) 
rownames(mtx) <- rl$V1 
colnames(mtx) <- cl$V1 

obj <- CreateSeuratObject(counts = mtx , min.cells = 10)
ADD COMMENT

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6