I am using Seurat to cluster data that previously has been filtered, aligned and turned into DGE by the Drop-Seq alignment pipline from Drop-seq tools. This has created a file sample_DGE.txt.gz. I then want to cluster my data and do a QC analysis through calculating the percent mithocondrial genes. I am following the Seurat Clustering tutorial found here: https://satijalab.org/seurat/pbmc3k_tutorial.html In this tutorial they use the files barcodes.tsv, genes.tsv and matrix.mtx generated by 10x genomics as raw data, and read it with the command Read10X(). I have generated these three files from our DGE data inspired by this biostars page: A: Storing a gene expression matrix in a matrix.mtx It works fine, except that the row name title "GENE" is stored as a column name, saved into barcodes.tsv, which later in Seurat is a problem because seurat uses "GENE" as one of the cell barcodes when calculating the percent mitochondrial DNA per cell. Example below:
This of course, makes it impossible to use VlnPlot, generating the error:
VlnPlot(object = pbmc, features.plot = c("nGene", "nUMI", "percent.mito"), nCol = 3) Error in if(all(data[,feature] == data,feature)) { : missing value where TRUE/FALSE needed
Simple removing "GENE" manually from the barcodes.tsv file creates a error in dimensions at the Read10X step.
pbmc.data <- Read10X(data.dir = "dir/to/barcode_matrix_and_gene_files") Error in dimnamesGets(x, value) : invalid dimnames given for "dgTMatrix" object stop(gettextf("invalid dimnames given for %s object" dQuote(class(x))), domail + NA) dimnamesGets(x, value)
SO my question is: do anyone know a workaround to this problem? Or is there an equivalent to Read10X(), say ReadDGE() or ReadDropseq() that can be used directly on my DGE file?
Thank you @Igor that is the answer to a question I've been pondering for a long time. However, ?CreateSeuratObject uses this example:
which in my case would be:
but it yields the error:
If you are not sure what a function does, you can check by putting a
?
in front of it. For example,?system.file
. That will tell you thatsystem.file
takes "character vectors, specifying subdirectory and file(s) within some package". In the example, they are usingpbmc_raw.txt
from the Seurat package. Your file is not stored in the Seurat package. You should specify the exact path where it is. Usingsystem.file
is not needed.Thank you @Igor, got it! :)
thanks @chilifan. I started working with scRNA-seq recently and was stuck in the same issue. you saved my life :-).