Hi,
I'm trying to perform differential gene expression analysis in R using single cell RNA sequencing data, to determine which genes are differentially expressed between clusters (cell type) of osteosarcoma tissue sample.
The public dataset can be found here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4952363. It has 3 data files: barcodes, features, and matrix.
I have combined them into into one matrix file (expression data) with barcode file as row names, and features file as colnames.
I have done this using these commands:
library(Matrix)
mat <- Matrix::readMM("~/Downloads/GSM4952363_OS_1_matrix.mtx.gz")
features <- read.delim("~/Downloads/GSM4952363_OS_1_features.tsv.gz",
header=FALSE)
barcodes <- read.delim("~/Downloads/GSM4952363_OS_1_barcodes.tsv.gz",
header=FALSE)
colnames(mat) <- barcodes[,1]
rownames(mat) <- features[,2]
I then tried following this workflow https://satijalab.org/seurat/articles/pbmc3k_tutorial.html to perform clustering in seurat/differential gene expression using mostly all the same commands as the workflow. However the UMAP plot I have made showing the different cell types doesn't match what was published in the authors paper. I have 6 cell types and the paper shows 9 different cell types. The commands I have used are below if it would be possible for anyone to see where I've gone wrong it would be so appreciated.
Commands for clustering in Seurat:
library(dplyr)
library(Seurat)
library(patchwork)
pbmc <- CreateSeuratObject(counts = mat, project = "pbmc3k", min.cells = 3, min.features = 200)
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3
and the rest of the commands are the same as the workflow.
Would it be best to try a different workflow?
Thankyou
Hello guys
First of all, for this type of data you have to put all three uncompress files into a directory and put their path into the code below (an embedded seurat function)
After that, create seurat object using command below
Rest of the pipeline is like what they said in their perfect vignettes.
It should be noted, the number of clusters can be different based on what resolution you tell to Seurat in the code below:
hope it helps, Milad Eidi
Thankyou kindly for your help.
I received an error message when I used this command data <- Read10X(data.dir = "plot")
Error in Read10X(data.dir = "plot") : Barcode file missing. Expecting barcodes.tsv.gz In the folder plot I have the three files as: "GSM4952363_OS_1_features.tsv", "GSM4952363_OS_1_matrix.mtx", "barcodes.tsv".
Do you know where I've gone wrong?