Hello.
I had a previous problem, in which I was unable to compute a violin plot or a feature plot for some genes on my seurat objects because of NA values.
By checking my raw matrixes, I finally identified the source of the problem, but cannot understand it ...
## I generated my seurat object :
data <- Read10X("mypath/filtered_feature_bc_matrix")
object <- CreateSeuratObject(counts = data, min.cells = 3, min.features = 200)
## Then, I checked the presence of my gene :
"IFNA5" %in% rownames(object)
## Which returned
False
## Then, I redid all, but this time I didn't add min.cells and min.features arguments, and this time it returned
True
I then pre-processed my object using the standard pipeline, and computed a FeaturePlot of this gene.
The result is clear :
All cells have the same value (0) of “IFNA5”
Which explains it all : The expression is at 0 for EVERY cell. This is the same for all of my interferon genes. And this is unexplainable, I have PBMCs so I should have at least some interferon somewhere, knowing that I'm working with stimulated samples in which the interferon production should be skyrocketting. And actually, I see the production of interferon-induced genes like IFITs and ISGs very high, but the interferon itself is at 0. This cannot be biological I reckon.
Does anyone have an explanation ? Why would my interferon genes be at 0 everywhere, can it come from the sequencing itself ? There are some other genes with which I have the same problem.
Thank you in advance
Do you have some QC metrics to show us ? Like the number of cells you have, the number of counts and features, percentage of mitochondrial genes etc...
It's the raw matrixes so I have 2 million features, going down to ~7000 after the following filtering : nCount_RNA > 500 & nCount_RNA < 20000 & percent.mt < 12 & percent.ribo < 30
What worries me is that in any case the expression of my genes is at 0 everywhere, even withouh any filtering and processing, so the problem seems to be in the original data
Your
features
is this context are genes not cells.From 2M cells to 7000 with this QC parameters sounds fishy. Can you make a violin plot of each QC metrics.
Quoting this :
Do you mean all your genes of interest (interferon related) or 100% of your genes in the genesXcells matrix ?
My genes of interest are at 0. And some others. Globally, aside from these, the data is super clean and downstream analyses work very well.
Here are the plots :
Ask the sequencing plateform to provide you the GTF/GFF they used to count your features. Look into this file if you can find your genes of interest and how they are annotated. If you cannot find them, that was a problem of annotation file. If you can find them, check in your bam files if you can find some reads at the position of your genes of interest in the annotation file. Investigate.
What was your method for mapping reads? What reference did you use?
I didn't do it, the sequencing platform directly provided the matrixes after running CellRanger