Hi,
I have a single-cell RNAseq data in matrix.mtx downloaded format from 10Xgenomics, I want to store that in a sparse (with a lot of zero) read counts file in txt format. , how can I do that?
Hi,
I have a single-cell RNAseq data in matrix.mtx downloaded format from 10Xgenomics, I want to store that in a sparse (with a lot of zero) read counts file in txt format. , how can I do that?
Your question is unclear. If you want to store the matrix in a sparse format, that would be the one without any zeros. I am assuming that you already have a matrix in sparse (MatrixMarket) format, but want to convert it into dense format. You can clear this up by showing us the first line of your file:
head -1 matrix.mtx
You matrix is already sparse if the screen output is something like this:
%%MatrixMarket matrix coordinate integer general
If so, this thread explains the conversion. If you actually have a dense matrix (with lots of zeros) and want to convert it into sparse format, this thread will show you how. If needed, I have a custom python script for the dense -> sparse
task as well.
Hi, if using R, DropletUtils is what you need: https://www.bioconductor.org/packages/release/bioc/html/DropletUtils.html
Kevin
A "sparse" matrix does not have a lot of zeros. 10X data in the three file output format is already sparse. If you want not-sparse data, cellranger has a mat2csv function.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Mensur, Thank you for your response. yes, I mean I need a dense data (according to your definition, a data with a lot of zero) to use in deep learning algorithms. So, I followed the instructions you share with me in "this thread", and there were two options. One using python, that only reads the data and not any conversion. The other option is using CellRenger. right? If yes, so should I install it on Unix/ Linux?
SciPy can read and write MatrixMarket files, and in that page I referenced before you already have an example of how to read the matrix. Once loaded, SciPy will also convert to a dense matrix or a numpy array. Note that you will need very large memory for this conversion, and in fact it may be impossible to do depending on your computer's RAM. Assuming the conversion works, you'll probably want to save it as a compressed array because the file will be huge.
I am explaining how to do this because you asked, but I recommend against it. All machine learning (ML) tools will struggle with dense datasets of this size, especially given its level of sparsity. Almost all types of modern ML models - (extreme) gradient boosting, random forests, even support vector machines - work with sparse matrices without any conversion. If you absolutely require dense data, I suggest truncated SVD for converting the sparse matrix into low-dimension dense data.
The reason I need a dense matrix is that I need to have genes in the rows and cells in the columns. So, what is your ideas to have this type of data?