Link to the previous post for context: 0 Features for COVID-19 ExpressionSet through GEOQuery
Hi all,
I am analyzing a COVID-19 dataset (GSE150728) and my aim for this dataset is to create clustering maps.
For this dataset, I was instructed to download the RDS files present in the GSE150728_RAW.tar file from the GEO website in the previous post (instead of using GEOquery).
After doing this (and untaring the files), I decided to bind these separate files to create a single object for my later analyses.
library(Biobase)
library(SingleCellExperiment)
library(raster)
files <- list.files(path = "C:/Users/arjun/GSE150728", pattern = "\\.rds$", full.names = TRUE) #stacks all the .rds files in working folder
stack <- do.call("rbind", lapply(files, readRDS))
I decided to convert the stack object into a SingleCellExperiment object and tried running T-SNE analyses on the new SCE.
library(SingleCellExperiment)
library(tidyverse)
library(Seurat)
sce <- SingleCellExperiment(stack)
library(scran)
library(scRNAseq)
library(DropletUtils)
library(Matrix)
RunTSNE(sce)
However, I am met with the following error:
00:01:59 UMAP embedding parameters a = 0.9922 b = 1.112
00:01:59 Read 13 rows and found 3 numeric columns
Error in uwot(X = X, n_neighbors = n_neighbors, n_components = n_components, :
n_neighbors must be smaller than the dataset size
After this setback, I decided to examine the contents of one of the .rds files in my working folder (using readRDS) and here is what I got:
$exon
34463 x 17057 sparse Matrix of class "dgCMatrix"
[[ suppressing 66 column names ‘TTGCTAAGCAGT’, ‘AACGACGGGTCT’, ‘AACGGGGGCGAG’ ... ]]
[[ suppressing 66 column names ‘TTGCTAAGCAGT’, ‘AACGACGGGTCT’, ‘AACGGGGGCGAG’ ... ]]
KLK11 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
CCDC159 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
C7orf50 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RP11-1437A8.4 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
EMP2 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
IFI27L2 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . ......
ANTXR1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RP11-586K12.10 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
..............................
........suppressing 16991 columns and 34448 rows in show(); maybe adjust 'options(max.print= *, width = *)'
..............................
[[ suppressing 66 column names ‘TTGCTAAGCAGT’, ‘AACGACGGGTCT’, ‘AACGGGGGCGAG’ ... ]]
RNU6-1128P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
SNORD116-23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RNU4-38P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RP11-527L4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RP11-49K24.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RP11-134J21.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RP11-344P13.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
$intron
23682 x 17057 sparse Matrix of class "dgCMatrix"
[[ suppressing 66 column names ‘TTGCTAAGCAGT’, ‘AACGACGGGTCT’, ‘AACGGGGGCGAG’ ... ]]
[[ suppressing 66 column names ‘TTGCTAAGCAGT’, ‘AACGACGGGTCT’, ‘AACGGGGGCGAG’ ... ]]
RP11-505K9.4 1 . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RP11-718G2.5 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
C3orf22 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
PIGZ 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
C19orf18 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . ......
SLC7A9 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
C19orf57 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RP11-45K10.2 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
..............................
........suppressing 16991 columns and 23667 rows in show(); maybe adjust 'options(max.print= *, width = *)'
..............................
[[ suppressing 66 column names ‘TTGCTAAGCAGT’, ‘AACGACGGGTCT’, ‘AACGGGGGCGAG’ ... ]]
ABHD11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RGL2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RP11-37E23.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
CTC-493L21.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
COX6A2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
AL928768.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
$spanning
20176 x 17057 sparse Matrix of class "dgCMatrix"
[[ suppressing 66 column names ‘TTGCTAAGCAGT’, ‘AACGACGGGTCT’, ‘AACGGGGGCGAG’ ... ]]
[[ suppressing 66 column names ‘TTGCTAAGCAGT’, ‘AACGACGGGTCT’, ‘AACGGGGGCGAG’ ... ]]
STX4 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
GPAM 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
RPL3 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
NDUFAF1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
MAPK1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
CYP24A1 . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
NUDT2 . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
C1orf52 . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
..............................
........suppressing 16991 columns and 20161 rows in show(); maybe adjust 'options(max.print= *, width = *)'
..............................
[[ suppressing 66 column names ‘TTGCTAAGCAGT’, ‘AACGACGGGTCT’, ‘AACGGGGGCGAG’ ... ]]
ZNF784 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
SPATA19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
KIAA0101 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
WBP5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
PTCHD3P1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
TMED2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
CTD-2589M5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
What is going wrong here in my process? Also, could someone explain what exactly is in these .rds files and how they relate to creating clustering maps?
For standard scRNA-seq analysis, splitting intro intron, exon, and spanning is probably not necessary. You can combine the three matrices into one count matrix. Also, you shouldn't be row-binding the matrices for different samples. I'm not familiar with scran, but you likely want to make separate single cell experiments, and then merge the experiments together.
I would not combine them. I would subset specifically for the exonic reads.
It's been show that intronic reads in scRNA-seq predominantly come from nascent RNA, and thus represent legitimately sequenced transcripts. If you exclude introns you could be masking certain interesting phenomenon, such as increasing/decreasing transcription rate for a gene, or cell types with increased intron retention. See the literature for RNA velocity for more information.
This is not correct. You should only include intronic reads if you are interested in them. Counting the intronic reads will inflate your estimates of "steady-state" mRNA - which is the focus of most transcriptomic studies.
I have tried your command and it results in the following error message:
I also tried converting the first .rds object into a SingleCellExperiment using SingleCellExperiment(). I get the following error message:
What is causing this? Are the matrices uneven, and if so, how do I fix it practically? Also, how do I combine the three count matrices to make one?