Entering edit mode
9 days ago
Fossil
▴
30
Hello, I am trying to use publicly available scRNA-seq data to deconvolute my own bulk RNA data. I am trying to work with Seurat and CIBERSORTx. Thanks a lot in advance for the help and your time. Unfortunately, when preparing my “single cell reference sample file”, it is way too heavy for creating a signature matrix on CIBERSORTx (>80,000 cells; it won’t upload). Therefore, I was planning to aggregate the counts (looking at subclusters).
Questions:
- 1 - Is summing the counts using ‘AggregateExpression’ the way to go?
- 2a - Do I need to log-transform/normalize the scRNA-seq data (e.g., using NormalizeData() in Seurat) before aggregating counts for the signature matrix? Should I apply the same to the bulk counts?
- 2b - CIBERSORTx requires that the scRNA-seq and bulk RNA files undergo the same processing (normalized or not) when generating the signature matrix (step 1). Does this also apply to the “Impute Cell Fractions” step (step 2)?
- 2c - Is it correct that CIBERSORTx normalizes data internally during the cell fraction imputation step? If so, does that mean I can use raw bulk counts directly from featureCounts without normalization?
- 3 - Can I use this sum aggregated matrix as a ‘signature matrix’ to impute cell fractions?
Here is my current code:
# Load the data
expressionmatrix <- ReadMtx(mtx = "genesortedmatrix.mtx", features = "genesv2.tsv", cells = "barcodes.tsv")
# Create Seurat object
Seurat_object <- CreateSeuratObject(expressionmatrix)
# Find variable features
Seurat_object <- FindVariableFeatures(Seurat_object, selection.method = "vst", nfeatures = 2000)
# Aggregate counts to create a subcluster signature matrix
subcluster_counts <- AggregateExpression(Seurat_object, group.by = "Subcluster")
# Extract the summed count expression matrix
signature_matrix <- subcluster_counts$RNA
rownames(signature_matrix) <- make.unique(rownames(signature_matrix))
# Replace any NA values with 0
signature_matrix[is.na(signature_matrix)] <- 0
# Convert matrix to df
signature_matrix_df <- as.data.frame(signature_matrix)
# Subset the signature matrix to include only genes present in the bulk mixture data
common_genes <- intersect(rownames(signature_matrix_df), rownames(bulkdata))
# Filter both datasets to include only these common genes
signature_matrix_filtered <- signature_matrix_df[common_genes, ]
bulkdata_filtered <- bulkdata[common_genes, ]
# Export sig. matrix and bulk mixture file as a tab-delimited file
write.table(signature_matrix_filtered, "signature.txt", sep = "\t", row.names = TRUE, col.names = TRUE, quote = FALSE)
write.table(bulkdata, "bulk.txt", sep = "\t", row.names = TRUE, col.names = TRUE, quote = FALSE)
Thanks a lot again!