How to create a signature matrix, using scRNA-seq data, for downstream CIBERSORTx deconvolution of bulk data when the data is too heavy?
0
1
Entering edit mode
9 days ago
Fossil ▴ 30

Hello, I am trying to use publicly available scRNA-seq data to deconvolute my own bulk RNA data. I am trying to work with Seurat and CIBERSORTx. Thanks a lot in advance for the help and your time. Unfortunately, when preparing my “single cell reference sample file”, it is way too heavy for creating a signature matrix on CIBERSORTx (>80,000 cells; it won’t upload). Therefore, I was planning to aggregate the counts (looking at subclusters).

Questions:

  • 1 - Is summing the counts using ‘AggregateExpression’ the way to go?
  • 2a - Do I need to log-transform/normalize the scRNA-seq data (e.g., using NormalizeData() in Seurat) before aggregating counts for the signature matrix? Should I apply the same to the bulk counts?
  • 2b - CIBERSORTx requires that the scRNA-seq and bulk RNA files undergo the same processing (normalized or not) when generating the signature matrix (step 1). Does this also apply to the “Impute Cell Fractions” step (step 2)?
  • 2c - Is it correct that CIBERSORTx normalizes data internally during the cell fraction imputation step? If so, does that mean I can use raw bulk counts directly from featureCounts without normalization?
  • 3 - Can I use this sum aggregated matrix as a ‘signature matrix’ to impute cell fractions?

Here is my current code:

# Load the data 
expressionmatrix <- ReadMtx(mtx = "genesortedmatrix.mtx", features = "genesv2.tsv", cells = "barcodes.tsv")
#  Create Seurat object 
Seurat_object <- CreateSeuratObject(expressionmatrix)
#  Find variable features 
Seurat_object <- FindVariableFeatures(Seurat_object, selection.method = "vst", nfeatures = 2000)

#  Aggregate counts to create a subcluster signature matrix
subcluster_counts <- AggregateExpression(Seurat_object, group.by = "Subcluster")
#  Extract the summed count expression matrix 
signature_matrix <- subcluster_counts$RNA
rownames(signature_matrix) <- make.unique(rownames(signature_matrix))
#  Replace any NA values with 0
signature_matrix[is.na(signature_matrix)] <- 0
#  Convert matrix to df
signature_matrix_df <- as.data.frame(signature_matrix)

#  Subset the signature matrix to include only genes present in the bulk mixture data
common_genes <- intersect(rownames(signature_matrix_df), rownames(bulkdata))
#  Filter both datasets to include only these common genes 
signature_matrix_filtered <- signature_matrix_df[common_genes, ]
bulkdata_filtered <- bulkdata[common_genes, ]

#  Export sig. matrix and bulk mixture file as a tab-delimited file
write.table(signature_matrix_filtered, "signature.txt", sep = "\t", row.names = TRUE, col.names = TRUE, quote = FALSE)
write.table(bulkdata, "bulk.txt", sep = "\t", row.names = TRUE, col.names = TRUE, quote = FALSE)

Thanks a lot again!

CIBERSORTx deconvolution Seurat scRNA-seq bulkRNA-seq • 436 views
ADD COMMENT

Login before adding your answer.

Traffic: 1808 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6