Hi,
I'm trying to learn how to perform differential gene expression analysis in R using single cell RNA sequencing data to determine which genes are differentially expressed between clusters (cell type) of osteosarcoma tissue sample.
The public dataset can be found here: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4952363. It has 3 data files: barcodes, features, and matrix.
Does anyone know how I might start the analyses. Also for the count matrix are the genes in rows, cells in columns?
I have seen this workflow: http://bioconductor.org/books/3.14/OSCA.multisample/multi-sample-comparisons.html#performing-the-de-analysis but am still confused.
Thank you
Where exactly does the confusion start?
The file
mtx.gz
is a sparse matrix format that stores the expression data. Load it with the following function:The other two files are the cell names (colnames) and the gene names (rownames). You can simply read them with
read.delim()
or similar functions and then assign to the matrix withrownames()
andcolnames()
. From there you can construct yourSingleCellExperiment
see?SingleCellExperiment
from the package of same name for starters.Thankyou for your response. mat <- Matrix::readMM("foo.mtx.gz") - this command gives an error message no such directory or file
How do I read in the matrix file into this command, if that makes sense.
Also would you be able to write out the commands for those steps e.g. read.delim() function etc. This would be so helpful
You obviously have to change the path to your actual file.
For the sake of your analysis, spend some time on R basics before diving deeper into this. With all due respect but if you do not know how to read a file then there is no way you are going to confidently apply any of the OSCA code.
I have one question if thats alright. I am in the process of adding colnames (barcodes) and rownames (features) to the sparse matrix (mat), however it gives an error message.
These are the commands: library(Matrix) mat <- Matrix::readMM("GSM4952363_OS_1_matrix.mtx") features <- read.delim("GSM4952363_OS_1_features.tsv") barcodes <- read.delim("GSM4952363_OS_1_barcodes.tsv") rownames(mat) <- features
Error in dimnamesGets(x, value) : invalid dimnames given for “dgTMatrix” object
Do you know why I have received this error message? Please help thanks again
Because
read.delim
returns adata.frame
, not a vector,so you have to select the appropriate (first) column to get the entry with the features. This is what I mean with R basics. I am not saying that to offend or make fun of you, really I don't, but this is like trying to repair an engine without knowing how to hold a skrew driver, it simply is not going to work without the necessary basics.You can check the output with
head(features)
andclass(features)
to see what the data are. You needrownames(mat) <- features[,1]
andcolnames(mat) <- barcodes[,1]
to select the first (and only) entry of those data.frames.I will try to learn the basics of R. I have just tried rownames(mat) <- features[,1] and colnames(mat) <- barcodes[,1] and I still get a error message. Error in dimnamesGets(x, value) : invalid dimnames given for “dgTMatrix” object
See how to assign row names and colnames to a sparse matrix