Storing a matrix.mtx in a gene expression matrix csv/ txt format
3
0
Entering edit mode
4.5 years ago
prgrmmr70 • 0

Hi,

I have a single-cell RNAseq data in matrix.mtx downloaded format from 10Xgenomics, I want to store that in a sparse (with a lot of zero) read counts file in txt format. , how can I do that?

RNA-Seq • 5.8k views
ADD COMMENT
1
Entering edit mode
4.5 years ago
Mensur Dlakic ★ 28k

Your question is unclear. If you want to store the matrix in a sparse format, that would be the one without any zeros. I am assuming that you already have a matrix in sparse (MatrixMarket) format, but want to convert it into dense format. You can clear this up by showing us the first line of your file:

head -1 matrix.mtx

You matrix is already sparse if the screen output is something like this:

%%MatrixMarket matrix coordinate integer general

If so, this thread explains the conversion. If you actually have a dense matrix (with lots of zeros) and want to convert it into sparse format, this thread will show you how. If needed, I have a custom python script for the dense -> sparse task as well.

ADD COMMENT
0
Entering edit mode

Hi Mensur, Thank you for your response. yes, I mean I need a dense data (according to your definition, a data with a lot of zero) to use in deep learning algorithms. So, I followed the instructions you share with me in "this thread", and there were two options. One using python, that only reads the data and not any conversion. The other option is using CellRenger. right? If yes, so should I install it on Unix/ Linux?

ADD REPLY
0
Entering edit mode

SciPy can read and write MatrixMarket files, and in that page I referenced before you already have an example of how to read the matrix. Once loaded, SciPy will also convert to a dense matrix or a numpy array. Note that you will need very large memory for this conversion, and in fact it may be impossible to do depending on your computer's RAM. Assuming the conversion works, you'll probably want to save it as a compressed array because the file will be huge.

I am explaining how to do this because you asked, but I recommend against it. All machine learning (ML) tools will struggle with dense datasets of this size, especially given its level of sparsity. Almost all types of modern ML models - (extreme) gradient boosting, random forests, even support vector machines - work with sparse matrices without any conversion. If you absolutely require dense data, I suggest truncated SVD for converting the sparse matrix into low-dimension dense data.

ADD REPLY
0
Entering edit mode

The reason I need a dense matrix is that I need to have genes in the rows and cells in the columns. So, what is your ideas to have this type of data?

ADD REPLY
1
Entering edit mode
4.5 years ago

Hi, if using R, DropletUtils is what you need: https://www.bioconductor.org/packages/release/bioc/html/DropletUtils.html

Kevin

ADD COMMENT
1
Entering edit mode
4.5 years ago

A "sparse" matrix does not have a lot of zeros. 10X data in the three file output format is already sparse. If you want not-sparse data, cellranger has a mat2csv function.

ADD COMMENT

Login before adding your answer.

Traffic: 1723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6