GEO Dataset: Difficulty Understanding Matrices, Compliation
1
0
Entering edit mode
14 months ago

Hi,

I am new to using GEO, novice at R (have used seurat for my own scRNA data), apologies for how basic this question is:

I am trying to replicate figure 1 from a recent paper using seurat. They've deposited their data as GSE220939. I am having trouble understanding the organization of the data. There are 22 patients. It seems the data was deposited as a matrix for each individual patient - a raw matrix and then a matrix where low expressing cells are removed.

My question: What is the best way to reconstruct a total count matrix for the paper? Essentially all patients as columns and genes as rows.

Any advice on the basics like automatically downloading the matrices and combining them would be helpful.

Thank you so much.

GEO scRNA-Seq • 815 views
ADD COMMENT
1
Entering edit mode
14 months ago
ATpoint 86k

Download the entire supplementary file, unpack it, then read every of the mtx files into R using Matrix::readMM(). Then use cbind to combine them. Same goes for the barcode and gene files, here you can use regular read.delim.

ADD COMMENT
0
Entering edit mode

Thank you. I was successfully able to combine everything in the raw files. It made a matrix of 36601 x 141326219.

For the matrices where low expressing cells are removed, would cbind also work? How does it work when some matrices don't have a gene and others do? Will it be able to combine them appropriately?

ADD REPLY
0
Entering edit mode

Removing cells does not mean removing genes. Cells are columns, genes are rows. Anyway, matching unequal objects requires something smarter than cbind, for example merge from base R, or one of the many *_jojn functions from dplyr, like full_join.

It made a matrix of 36601 x 141326219.

141mio columns, you sure?

ADD REPLY

Login before adding your answer.

Traffic: 908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6