Retrieving single-cell dataset from GEO database
1
0
Entering edit mode
16 months ago

I want to retrieve the dataset GSE152938 from the GEO database and convert it to a Seurat object. However, my code below returned an empty dataframe.

Is the dataset itself empty or did I import the dataset wrongly?

gse <- getGEO("GSE152938",GSEMatrix=TRUE)

eset <- gse[[1]]
pData(eset)
exprs(eset)

Output:

exprs(eset)
     GSM4630027 GSM4630028 GSM4630029 GSM4630030 GSM4630031

Convert to Seurat:

seurat <- CreateSeuratObject(exprs(eset))
seurat GEO r single-cell • 1.5k views
ADD COMMENT
0
Entering edit mode

melissachua90 why did you delete this post?

ADD REPLY
0
Entering edit mode
16 months ago
ATpoint 86k

getGEO is only for array data, not for NGS. You cannot retrieve this dataset (like counts) via this package, so an empty eset is expected. You have to use what the authors provide as supplement files on the GEO page, or if that is not what you need either email them for processed data or download the fastq files and preprocess yourself.

ADD COMMENT
0
Entering edit mode

Thanks for the advice. I downloaded the supplementary file and read it into R but it threw an error.

data <- data.table::fread("GSE152938_RAW.tar", data.table=F)

Traceback:

Avoidable 501.354 seconds. This file is very unusual: it ends abruptly without a final newline, and also its size is a multiple of 4096 bytes. Please properly end the last row with a newline using for example 'echo >> file' to avoid this  time to copy.
Error in data.table::fread("GSE152938_RAW.tar", data.table = F) : 
  embedded nul in string: ')]\v\xfe\xab+\xf0\035...'
In addition: Warning message:
In data.table::fread("GSE152938_RAW.tar", data.table = F) :
  Detected 3 column names but the data has 2 columns. Filling rows automatically. Set fill=TRUE explicitly to avoid this warning.
ADD REPLY
1
Entering edit mode

Hey, You can process individual pathologic types as follows. I have you here demo for processing of normal kidney-

#Work these in your terminal
mkdir my_workingDir
#Download the data `GSM4630031_Normal_kidney.tar.gz` in your working directory (my_workingDir) and then you can extract files as follows:
tar -xzvf GSM4630031_Normal_kidney.tar.gz
#you will see barcodes.tsv.gz,  features.tsv.gz and  matrix.mtx.gz files

#Do these in R
library('Seurat')
Normal_kidney=Read10X("/Volumes/scratch/lessard/bk/test_dir/GSM4630031_Normal_kidney/")
Normal_kidney_seurat_object = CreateSeuratObject(counts = Normal_kidney, project = "Normal_kidney")

#Let's check Seurat object
Normal_kidney_seurat_object
An object of class Seurat 
33538 features across 4800 samples within 1 assay
Active assay: RNA (33538 features, 0 variable features)
enter code here

#Let's Check meta.data
head(Normal_kidney_seurat_object@meta.data)
                  orig.ident nCount_RNA nFeature_RNA
AAACCCACAGGCAATG-1 Normal_kidney      14514         1891
AAACCCAGTAGCGTAG-1 Normal_kidney      12890         2561
AAACCCAGTAGTTAGA-1 Normal_kidney      29267          326
AAACCCAGTCTACATG-1 Normal_kidney      15730         3288
AAACCCATCATTTCGT-1 Normal_kidney      14574         2879
AAACCCATCGATTTCT-1 Normal_kidney      23930          138
ADD REPLY

Login before adding your answer.

Traffic: 2065 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6