Entering edit mode
7 months ago
carolofharvest
▴
50
I used the 'FindVariableFeatures' function from the Seurat package to identify variable features, but some of the genes appearing in the results are only expressed in 10-15 cells, and these cells are not even in a single cluster. What should I do in this situation?
What does this mean? If you run clustering on all cells then every cell is assigned to one cluster. Check if strange cells are outliers in a QC metric. Or it's just a celltype that is not abundant or poorly captured by the single-cell tech.
I apologize for the confusion. What I meant to say is that, for example, the Trbv17 gene appears among the variable genes. However, when I plot the feature plot, this gene is expressed in only a very small number of cells, and these cells are scattered across random clusters on the UMAP plot.
Compared to the total number of cells, the number of cells expressing Trbv17 is very very small. I don't understand how this gene can be detected as a highly variable gene in this case since majority of the cells dont express this gene.
You need to be clear about how
Seurat
defines highly variable genes here. Highly variable genes are the genes that have very high expression in some cells and low or no-expression in other cells. Thus in your case, Trbv17 gene is rightly picked as a variable gene as you are seeing in your featureplot its expression in a very few cells, which is totally expected. There is nothing wrong with it.Thank you.
But can we say that this is biologically informative ? If this gene had been detected in excess specifically within a single cluster, then I would say it had some meaning. However, since the cells producing this gene are randomly distributed on the UMAP plot, I conclude it cannot have any significance, indicating that it doesn't contribute much to the clustering aspect either.
Can we impose a filter that requires a gene to be produced in at least 20 cells in order to be selected as a variable gene?