I am working on creating a signature matrix based on an input (a given single cell RNA-seq dataset). To clarify, my input matrix has the following format:
AAACGGGAGATCCCGC.1 AAATGCCTCCAATGGT.1 AACCATGTCAGTGCAT.1 AACCATGTCTGTACGA.1 AACCGCGTCCGCATAA.1
LGALSL 0 0 0 0 0
CD247 0 0 0 0 1
XKR6 0 5 0 1 2
KLHL23 0 4 0 0 0
MTHFSD 0 0 0 0 1
KHK 0 0 0 0 0
TCERG1 0 2 0 1 1
DNAJA3 1 0 0 1 0
TRAPPC3L 0 0 0 0 0
PAAF1 0 1 0 1 1
I have 20,000 rows of genes and 30,000 columns of combination of cells from different individuals. I want to create a list of deferentially expressed genes, or formally a signature matrix. The final goal is to estimate cell type proportions.
Tools such as CIBERSORTx and DWLS do the job, and they internally create signature matrices (which include fewer genes compared to the original input). These tools become very slow when the input file is large.
Are there any other ways to create a signature matrix? In other words, for each cell type, is there any quick way of identifying a list of genes?
Here is my potential solution but need to improve it. DWLS uses a two step procedure. In the first step, it chooses genes based on a chosen fold change threshold, and then in the second step, it excludes more genes based on p-values generated by MAST. This second step is time consuming. I wonder if there was any replacement for the second step to speed it up.
What do you mean by signature matrix? Cell type or state-specific markers? An example would be helpful here.
Thanks for your comment. I updated my question.
Hi Ali,
If by signature matrix, you mean something like this:
Then Seurat could be a good place to start. Check these out:
https://satijalab.org/seurat/articles/install.html
https://satijalab.org/seurat/articles/get_started.html
https://satijalab.org/seurat/articles/pbmc3k_tutorial.html
If you're worried about speed check out python packages like ScanPy:
https://scanpy.readthedocs.io/en/stable/
There are options (parallelization) to run both (Seurat and ScanPy) packages faster.
Many thanks for the info. I updated my original question with more information.