Single cell-seq data preprocessing-How to detect the gene/transcript distribution for each single cell
2
0
Entering edit mode
5.5 years ago
sreekalasn • 0

Hello everyone, I have an expression matrix log TPM+1 for 14,000 cells and 23,000 genes (GSE87544). In the paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5782816/#SD9), the authors analysed 14,000 cells and reduced the data to 3000 cells and 2000 genes, before using Seurat for cell clustering.

I am new to single cell seq and in the learning process. I would appreciate help regarding the pre-processing of single-cell seq data (or finding gene/transcript distribution as in this case), since I could not find sources discussing the data pre-processing in detail.

Thank you very much!

scRNA-seq • 2.3k views
ADD COMMENT
3
Entering edit mode
5.5 years ago

A good primer about pre-processing single-cell RNA-seq is Aaron Lun's paper and the numerous simpleSingleCell vignettes (Starting from "UMI" or "Droplet-based data").

A good intro focused on QC of scRNA-seq data is also part of the scater package documentation.

ADD COMMENT
0
Entering edit mode

Thank you so much. I found these sources very useful

ADD REPLY
1
Entering edit mode
5.5 years ago

2000 genes could be the most variable genes across cells which will be used for PCA and then t-SNE/UMAP.

Filtering cells should be defined in methods of the paper. Abnormally high UMI counts, high mitochondrial genes, low number of genes captured, low sequencing depths, doublets etc can be some of the reasons to filter scRNA data. It also depends on the version of Seurat.

A quick read at paper says "From the 14,000 cells analyzed, 3,319 cells have more than 2,000 genes detectable in a single cell".

Its sad that you did not keep minimal effort to read the paper you are interested in.

ADD COMMENT
0
Entering edit mode

I did go through the paper multiple times. However, the authors have not described in detail how they filtered the data and found the "highly variable genes". They have referenced another article, but again, I could not understand the filtering part. Hence, I posted the question here hoping to receive some help. Thanks for your heads up on the plausible factors to filter scRNA data.

ADD REPLY

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6