Good afternoon,
I am currently preparing a Signature Matrix using Cibersort. In the first step it is required to provide a single cell reference sample file, which is basically a simple txt file with expression of each single cell type in the column and the genes in the rows.
To create this I am using 7 scRNA samples and annotated the clusters with the help of Seurat/singleR. Now the issue is that there is a batch effect between these 7 samples. For that reason I integrated the data for clustering purposes in my previous analysis. However, e.g. for the Differential Expression Analysis I had to use the non-integrated data.
My question is: For the single cell reference sample file should I use the integrated or non-integrated data?
Thank you very much for your help!
I don't think you want your signature matrix to consist of multiple samples. Is there a reason you are trying to deconvolute multiple samples at a time? What are the 7 samples? Are they replicates or different conditions, time points etc?
jv I thought the signature matrix would be more powerful/reliable if I use not only one sample, which could introduce bias. The 7 samples (of 7 different patients) are all tumour samples of one cancer at diagnosis (so pretty similar). But in previous analysis of these 7 samples I saw that the gene expression was clustering by sample/patient. I therefore integrated the data to better identify the cell clusters (instead of clusters by sample).