Good afternoon,
I am working with a dataset of 7 patients. I have merged all raw counts to one count object with JoinLayers:
Now I would like to know whether there are certain genes differentially expressed between my Seurat clusters. For that I ran
markers_all<-FindAllMarkers(kid.filtered_new,
test.use = 'DESeq2',
slot = 'counts')
However, it does not find any differentially expressed genes, which I believe can't be true.
I wonder whether it has to do with the way I joined the counts of the 7 patients.
kid.filtered_new<- JoinLayers(kid.filtered)
pbmc_counts_combined <- GetAssayData(kid.filtered_new, slot = 'counts')
pred_combined <- SingleR(test = pbmc_counts_combined,
ref = ref,
labels = ref$label.main)
kid.filtered_new$singleR.labels <- pred_combined$labels[match(rownames(kid.filtered_new@meta.data), rownames(pred_combined))]
Maybe I should perform all the data preparation jobs again after joining the data (I ran all this before joining the layers, which I believe is then done by patient (layer) individually):
kid.filtered <- NormalizeData(object = kid.filtered)
kid.filtered <- FindVariableFeatures(object = kid.filtered)
kid.filtered <- ScaleData(object = kid.filtered)
kid.filtered <- RunPCA(object = kid.filtered)
kid.filtered <- FindNeighbors(object = kid.filtered, dims = 1:20)
kid.filtered <- FindClusters(object = kid.filtered, resolution=0.1) # directly specify which resolution
kid.filtered <- RunUMAP(object = kid.filtered, dims = 1:20)
Thank you for any ideas!
Bine
it looks like you merged the raw data for each sample. Have you considered integration/batch effect correction? This step is often important for comparing scRNA-seq data between samples
Thank you very much for your reply. First of all I should mention that I recently started working with scRNA Seq. What I read so far is that for the differential expression analysis it is not recommended to use integrated counts, so it wouldnt be relevant? See: https://github.com/satijalab/seurat/issues/1717
Apart from that, I read that "ScaleData" should take care of batch effects?
Beside this I am also a bit confused when to perform the above standard pre-process steps - should I do it before or after joining the layers or both?
Thank you again for your help!!
ScaleData
will scale, and optionally center, the gene counts. This is a very different process than integration and batch correction. There's a good description ofScaleData
available at https://github.com/satijalab/seurat/issues/1166The Satija lab suggests that the counts generated after integration should not be used for differential expression testing. But that is not to say that integration shouldn't be performed. The key thing is that integrating the data allows you to find (via clustering) groups of similar cells across samples specifically while compensating for batch effects which in turn may yield more accurate clustering results.
You would then use the cluster labels to pull out the raw counts of the appropriate cells for doing DE testing, e.g., cluster 1 contains your cell type of interest and there is a mix of cells from each sample in cluster 1. So save the raw data as a separate object from the integrated data.
Thank you very much that makes sense. Sorry I have two more questions on this:
1) In the meantime I have found the argument "latent.vars" in the FindAllMarkers function: https://www.rdocumentation.org/packages/Seurat/versions/2.3.0/topics/FindAllMarkers
Could I expect that this has a similar effect on the batch effect as the integration? Because it still seems to me that I should adjust for batch effect when performing the differential expression analysis.
2) I am facing issues with the annotation after integration as it seems that SingleR (which I am using for annotation) does not take integrated counts for annotation: SingleR for integrated scRNA-seq analysis
Do you have any idea on this?
Thanks again for your help, I really appreciate it!