Question

co-expression analysis from a scRNA-seq data

0

Entering edit mode

5.5 years ago

BenHu • 0

I have downloaded a public expression matrix for a scRNA-seq. Does anyone know how to perform Gene-Gene Co-expression, like this paper Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain. Best,

RNA-Seq • 3.0k views

ADD COMMENT • link updated 5.5 years ago by Kevin Blighe 89k • written 5.5 years ago by BenHu • 0

score 0 · Answer 1 · 2019-11-11

0

Entering edit mode

5.5 years ago

Kevin Blighe 89k

Hey,

You can read the methods of the work that you cite, and, in that way, follow what the authors did. Go here and then go to STAR Methods.

The 2 sections within those methods that you will want to review are

ICA based analysis and clustering
Correlation analysis across cell populations

Kevin

ADD COMMENT • link 5.5 years ago by Kevin Blighe 89k

0

Entering edit mode

if I know how to do that, I wouldn't ask this question.

ADD REPLY • link 5.5 years ago by BenHu • 0

0

Entering edit mode

Hello, which part, specifically, are you finding it difficult to follow? I took a closer look myself and can deduce the following rough steps to help you get started:

Step 1 - filtering

Filter out cells with fewer than 400 expressed genes
Filter include highly variable genes across all tissues (you can use your own metrics, if you wish)

Step 2 - ICA (independent component analysis)

Convert highly variable gene matrices to Z-scores ("[The] selected genes were then centered and scaled across all cells")
Perform ICA using fastICA package in R, configured to output the first 60 components, and performed separately on each tissue.

Step 3 - KNN clustering

Perform clustering on the 60 ICA components using the cluster implementation in Seurat. Basically, re-use Seurat's functions FindNeighbors() and FindClusters(). I use these in a function in a package that I'm currently developing, to give you an idea: https://github.com/kevinblighe/scToolkit/blob/master/R/clusKNN.R

--------------------------

That should bring you up to the line "To identify finer substructure among these classes, classes with more than 200 cells were selected for subclustering", whereby they then commence a second round of ICA on a finer subset of genes, it seems.

Unfortunately, following bioinformatics methods can be a nightmare, because it is impossible to accurately write in English language the minute details that are required to comprise a comprehensive methodology.

ADD REPLY • link 5.5 years ago by Kevin Blighe 89k

0

Entering edit mode

You might also want to take a look at this article to get ideas for alternatives to pearson correlation.

ADD REPLY • link 5.5 years ago by Kristoffer Vitting-Seerup ★ 4.2k