Hi,
I am new to using GEO, novice at R (have used seurat for my own scRNA data), apologies for how basic this question is:
I am trying to replicate figure 1 from a recent paper using seurat. They've deposited their data as GSE220939. I am having trouble understanding the organization of the data. There are 22 patients. It seems the data was deposited as a matrix for each individual patient - a raw matrix and then a matrix where low expressing cells are removed.
My question: What is the best way to reconstruct a total count matrix for the paper? Essentially all patients as columns and genes as rows.
Any advice on the basics like automatically downloading the matrices and combining them would be helpful.
Thank you so much.
Thank you. I was successfully able to combine everything in the raw files. It made a matrix of 36601 x 141326219.
For the matrices where low expressing cells are removed, would cbind also work? How does it work when some matrices don't have a gene and others do? Will it be able to combine them appropriately?
Removing cells does not mean removing genes. Cells are columns, genes are rows. Anyway, matching unequal objects requires something smarter than cbind, for example
merge
from base R, or one of the many*_jojn
functions from dplyr, likefull_join
.141mio columns, you sure?