We have characterized a type of cell that is different in its functions from other cells in general. We would like to create a gene signature from 4 samples that were sequenced with RNAseq.
I've been trying to learn a way to do this without comparing to other cell types, but if needed, this would be ok too.
Is there any literature or method that you guys recommend to creating a gene signature?
1 - We tought about finding the top expressed genes that have the least variance between samples, as one way.
2 - The other way was to find enriched pathways for the most expressed genes, select the pathways that are relevant to our biological condition and then apply the 1st filter (which risks losing relevant genes for the signature but keeps only genes relevant to both cell type and relevant biological pathway)
I've seen ssGSEA and other method but I don't understand how they can be usefull to find our own gene signature!
Is there any way to find the "similarity" between our 4 samples?
I've been trying to learn a way to do this without comparing to other
cell types, but if needed, this would be ok too.
Is there any literature or method that you guys recommend to creating
a gene signature?
A gene signature is a list of genes that distinguishes a biosample from others, so you will need to compare to other cell types. If it is a blood cell, then you will need to compare to other blood cell types and identify the genes whose expression distinguishes your new cell compared to other cells.
One approach could be to calculate the mean expression of genes of other cells and compare this to the mean expression in your new cell type, and select the genes with the highest ratio.
Another approach could be to use a heatmap to cluster samples and genes, this will identify genes that are expressed only in your cell type, but you'll need to extract the clutering information out of the gene dendrogram.
Is there any way to find the "similarity" between our 4 samples?
I find using PCA analysis useful to viauslise the similarity of the new samples to other known cell types. Apart from that, you can calculate correlation coefficients
A signature is not a fixed and immutable list of genes, it is relative. Imagine a CD4 T-cell. There are hundreds of genes that separate it from a muscle cell, and on the between-organ/tissue scale that could serve as a signature to define T-cells. But if you compare CD4 with CD8 and Treg cells then this list is completely useless as it is far to broad to define the specific CD4 subset. That having said, you first of all have to decide the scale you want to compare your cells with and once that is decided you need suitable data from these other cells you want to compare with. One approach would be to run differential expression between all relevant cell population and then for each population select genes that are overexpressed against all others. Or against all but one other, or two...as said it is relative and stringency depends on your goal. For populations in a tight developmental continuum such as hematopoietic progenitors it is often hard to find "unique" markers that are only present in one cell type but not in all others. Here, signatures are rather the combination of many genes rather than a set of unique individual genes. It all really depends on your setup.