I have this GEO data (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE141183), which contains files in the formats barcodes.tsv.gz, genes.tsv.gz, and matrix.mtx.gz. I used Seurat to integrate these files. I can perform analysis and other tasks with this data. However, I am facing an issue. I need to generate a gene expression matrix where rows represent genes and columns represent sample IDs. But when I use Seurat, I find that genes are represented as rows and nucleotides as columns.
To address this, I executed the following commands:
gene_expression_matrix <- GetAssayData(object = mtx_obj.seurat.obj, slot = "counts")
write.table(gene_expression_matrix, file = "gene_expression_matrix.tsv", sep = "\t", row.names = TRUE, col.names = TRUE)
I then ran Salmon on this matrix to obtain the expression matrix where genes are rows and sample IDs are columns (but I am getting an error here)
Here how I run Salmon on the gene expression matrix file
salmon quant -i salmon_index -l A -r gene_expression_matrix.tsv -o salmon_output
Now, I have two questions:
Are these steps correct for obtaining the gene expression matrix? If yes, how can I extract the resulting matrix?
Edit: I need to convert the cell barcode to the corresponding sample id.
Edit: I need to run PACNet (http://ec2-44-201-176-192.compute-1.amazonaws.com/PACNet/webApp/), CellNet updated version. The input file should be metadata (which is easy to create) and an expression matrix. The expression matrix has to have (gene symbols as row names and sample names as column names). In my case, I have gene symbols as row names and cellular barcodes as column names. In my metadata, I have a sample ID, and according to PACNet, "column names of the expression matrix must match the sample_name column of the sample metadata table". So, I need to convert the cellular barcodes to sample IDs.
For example, if I have 1 sample and 10 genes, then I should have a matrix of 10x1 (10 rows, and 1 column), but when I read the files (either by readMM or Read10x) I will get a matrix of 10x100(for example) because there are 100 cellular barcodes.
Thanks
Thanks for your reply. I have used the suggestions before and had the same issue. Let me elaborate a bit more.
I need to run PACNet (http://ec2-44-201-176-192.compute-1.amazonaws.com/PACNet/webApp/), CellNet updated version. The input file should be metadata (which is easy to create) and an expression matrix. The expression matrix has to have (gene symbols as row names and sample names as column names). In my case, I have gene symbols as row names and cellular barcodes as column names. In my metadata, I have a sample ID, and according to PACNet, "column names of the expression matrix must match the sample_name column of the sample metadata table". So, I need to convert the cellular barcodes to sample IDs.
For example, if I have 1 sample and 10 genes, then I should have a matrix of 10x1 (10 rows, and 1 column), but when I read the files (either by readMM or Read10x) I will get a matrix of 10x100(for example) becuase there are 100 cellular barcodes.
Thank you so much
It seems to me that this is a method developed for bulk RNA-seq, and this is where one typically calls a column a "sample". However, in single-cell each column is a cell. You might want to aggregate/pseudobulk your cells into samples somehow, but how this needs to be done for your study I cannot tell. Generally, one would sum counts per gene and cells for the groups for this.
Thanks for your suggestion