In their methods they state it's a matrix market format.
And the count matrix is in MatrixMarket format.
However, the file on GEO is missing the MM header. Usually the first (non %%) line will have the row number, column number, and number of non-zero values. This lets whatever reader that's opening the file know the original dimensions of the matrix. The file on GEO is missing this.
curl 'https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM5596nnn/GSM5596828/suppl/GSM5596828%5FSMSG%5F12%5F13w%5Fumi%5Fcounts.matrix.gz' | gzip -dc | head -n5
7553 1 1
15460 2 1
15557 3 1
19314 3 1
You could guess that the numeric values in the first and second columns are the row and column numbers, but it would be better to check with the original depositor of the data since they uploaded a broken file, and knowing the row and column numbers are ciritical for merging the cell barcodes and features into the matrix.
Here's an example of what the header of the file should look like. I'll create an example sparse matrix in R and then save it to a file.
library("Matrix")
set.seed(420)
# Create the example matrix.
mat <- matrix(sample(c(0, 1, 2), 20, replace=TRUE), nrow=5)
> mat
[,1] [,2] [,3] [,4]
[1,] 0 0 0 0
[2,] 0 0 2 1
[3,] 1 0 0 2
[4,] 1 0 0 1
[5,] 0 0 0 2
# Save it as a sparse matrix MM format.
writeMM(as(mat, "sparseMatrix"), "matrix.mtx")
Now we can check the header of this file.
head -n5 'matrix.mtx'
%%MatrixMarket matrix coordinate integer general
5 4 7
3 1 1
4 1 1
2 3 2
For the normal Seurat workflow those files are usually put in one directory and imported with the
Read10X
function.If you go this route you need to rename the three files to
barcodes.tsv.gz
,features.tsv.gz
, andmatrix.mtx.gz
as per the normal CellRanger convention.I did try this and then this error pops up:
Error in readMM(file = matrix.loc) : file is not a MatrixMarket file
I then tried to read in the matrix file itself by using readMM() and it says it is not a MatrixMarket file so I think that the matrix.gz file I downloaded directly from GEO is not actually a matrix...
I am actively trying to figure out a solution to this but if you have any more insights, I would welcome them!