Question

Retrieving single-cell dataset from GEO database

0

Entering edit mode

16 months ago

melissachua90 ▴ 70

I want to retrieve the dataset GSE152938 from the GEO database and convert it to a Seurat object. However, my code below returned an empty dataframe.

Is the dataset itself empty or did I import the dataset wrongly?

gse <- getGEO("GSE152938",GSEMatrix=TRUE)

eset <- gse[[1]]
pData(eset)
exprs(eset)

Output:

exprs(eset)
     GSM4630027 GSM4630028 GSM4630029 GSM4630030 GSM4630031

Convert to Seurat:

seurat <- CreateSeuratObject(exprs(eset))

seurat GEO r single-cell • 1.5k views

ADD COMMENT • link updated 16 months ago by Ram 44k • written 16 months ago by melissachua90 ▴ 70

0

Entering edit mode

melissachua90 why did you delete this post?

ADD REPLY • link 16 months ago by Ram 44k

score 0 · Answer 1 · 2023-08-17

0

Entering edit mode

16 months ago

ATpoint 86k

getGEO is only for array data, not for NGS. You cannot retrieve this dataset (like counts) via this package, so an empty eset is expected. You have to use what the authors provide as supplement files on the GEO page, or if that is not what you need either email them for processed data or download the fastq files and preprocess yourself.

ADD COMMENT • link 16 months ago by ATpoint 86k

0

Entering edit mode

Thanks for the advice. I downloaded the supplementary file and read it into R but it threw an error.

data <- data.table::fread("GSE152938_RAW.tar", data.table=F)

Traceback:

Avoidable 501.354 seconds. This file is very unusual: it ends abruptly without a final newline, and also its size is a multiple of 4096 bytes. Please properly end the last row with a newline using for example 'echo >> file' to avoid this  time to copy.
Error in data.table::fread("GSE152938_RAW.tar", data.table = F) : 
  embedded nul in string: ')]\v\xfe\xab+\xf0\035...'
In addition: Warning message:
In data.table::fread("GSE152938_RAW.tar", data.table = F) :
  Detected 3 column names but the data has 2 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.

ADD REPLY • link 16 months ago by melissachua90 ▴ 70

1

Entering edit mode

Hey, You can process individual pathologic types as follows. I have you here demo for processing of normal kidney-

#Work these in your terminal
mkdir my_workingDir
#Download the data `GSM4630031_Normal_kidney.tar.gz` in your working directory (my_workingDir) and then you can extract files as follows:
tar -xzvf GSM4630031_Normal_kidney.tar.gz
#you will see barcodes.tsv.gz,  features.tsv.gz and  matrix.mtx.gz files

#Do these in R
library('Seurat')
Normal_kidney=Read10X("/Volumes/scratch/lessard/bk/test_dir/GSM4630031_Normal_kidney/")
Normal_kidney_seurat_object = CreateSeuratObject(counts = Normal_kidney, project = "Normal_kidney")

#Let's check Seurat object
Normal_kidney_seurat_object
An object of class Seurat 
33538 features across 4800 samples within 1 assay
Active assay: RNA (33538 features, 0 variable features)
enter code here

#Let's Check meta.data
head(Normal_kidney_seurat_object@meta.data)
                  orig.ident nCount_RNA nFeature_RNA
AAACCCACAGGCAATG-1 Normal_kidney      14514         1891
AAACCCAGTAGCGTAG-1 Normal_kidney      12890         2561
AAACCCAGTAGTTAGA-1 Normal_kidney      29267          326
AAACCCAGTCTACATG-1 Normal_kidney      15730         3288
AAACCCATCATTTCGT-1 Normal_kidney      14574         2879
AAACCCATCGATTTCT-1 Normal_kidney      23930          138

ADD REPLY • link 16 months ago by bk11 ★ 3.0k