Single Cell RNA Seq. Analysis
6
7
Entering edit mode
5.7 years ago
saqlain ▴ 90

Hi all, I am new to the BioInformatics, and quite a beginner in programming languages. Can anyone suggest me some sources where I can at least learn 50% of the scRNA seq data analysis? I am familiar with C language, and I know a little bit of molecular biology too.

RNA-Seq • 6.2k views
ADD COMMENT
0
Entering edit mode

Thank You all. I am highly indebted, thank you very much. I have started working whatever you all have suggested, and any further help will be equally appreciated.

ADD REPLY
18
Entering edit mode
5.6 years ago

The Hemberg Lab has a very useful introduction to scRNA-seq analysis using R.

In terms of R packages and actually understanding why the objects are becoming fairly complex, the explanations from the bioconductor people are fairly insightful and the principles are applicable to Seurat, too (although the names of the accessor functions for, say, retrieving the matrix of read counts, will be different). The accompanying book is here.

Generally, there are these steps that the analysis will involve:

  1. Read alignment (FASTQ --> BAM), depends a bit on the type of data you have, for data from the 10X Genomics platform, they offer their CellRanger software, but there are other tools like alevin and STARsolo. This step is usually done for all NGS data, but it is slightly more complicated for single-cell data because the tools need to keep track of where each read came from (which cell and which transcript, if UMI were used)
  2. Count matrix generation: The first major goal is to obtain a matrix of read counts per gene, where rows usually correspond to genes and columns to cells. For single-cell RNA-seq, this is usually part of the alignment step.
  3. Filtering, Normalization, Batch correction, ...: this is where scRNA-seq becomes really frustrating, even for experienced bioinformaticians because there's no real consensus yet as to how scRNA-seq data is properly normalized. This is why many people will point you to Seurat, which pretends it has it all figured out by providing functions that are aptly named NormalizeData and ScaleData and if your data looks similar to what people have been working with, the default settings may work.
  4. Dimensionality reduction: tSNE, PCA, UMAP, ... These are techniques to allow you to represent the data in a xy-coordinates ( = 2 dimensions) rather than the original number of dimensions your count matrix will have (probably something like 30 000 genes x 10 000 cells).
  5. Clustering cells: usually done with graph-based methods because they seem to offer the best compromise between speed and accuracy for single-cell data, there are a couple of excellent reviews on the topic: Menon 2018, Kiselev 2019, and some benchmarking papers: Duo 2019, Freytag 2019
  6. Assigning labels to cells: this is usually the main goal of many scRNA-seq data sets these days and it's usually quite tricky, but in principle, we're expecting to see certain genes that are only expressed in certain clusters of cells (marker genes) and based on those we try to infer the "cell type". While not very technical in nature, I found the discussions by Jesse Gilles and Meghan Crow (here and here) quite insightful.

In short, as other have pointed out, scRNA-seq is really not ideal to start out as a bioinformatician because it's a fairly new data type and we're still grappling with all its intricacies and caveats. That being said, you may find more automated solutions like the one provided by the EPFL (asap) useful to play around with some data, just be cautious with making bold interpretations and claims.

ADD COMMENT
0
Entering edit mode

You may also find the slides from a class I recently taught helpful (Chapter 10 would be the one focused on scRNA-seq), as well as Lior Pachter's intro

ADD REPLY
0
Entering edit mode

I also want to point out the truly excellent write-up of numerous people involved in the development of the infrastructure for analyzing scRNA-seq using R packages hosted on bioconductor: "Orchestrating single-cell analysis"

ADD REPLY
6
Entering edit mode
5.7 years ago
GenoMax 148k

Comprehensive collection of all things single cell (including tutorials) : https://github.com/seandavi/awesome-single-cell

ADD COMMENT
0
Entering edit mode

Thank you. Can you also suggest me some material where I can learn how to implement statistics?

ADD REPLY
0
Entering edit mode

how to implement statistics?

What do you mean by that?

ADD REPLY
0
Entering edit mode

Don't implement statistics yourself. There is specialized software for all common (sc)RNA-seq analysis, please use google and the search fuction. Seurat is a good starting point, as mentioned above.

ADD REPLY
6
Entering edit mode
5.7 years ago
ATpoint 86k

Single-cell data are rather unpleasant as a beginner's topic due to the noisy and sparse nature of these data. Maybe better first analyze some bulk RNA-seq data to get familiar with R (see here), and then dive into the documentation of Seurat which is the jack-of-all-trades in terms of scRNA-seq analysis. For lowlevel processing alevin is a good choice.

ADD COMMENT
6
Entering edit mode
5.7 years ago
Bogdan ★ 1.4k

Yes, Seurat would be one of the starting points. Beside the tutorials offered on Seurat web site, a while I have posted some R code on Seurat github page : https://github.com/satijalab/seurat/issues/1193 (hope it is helpful)

ADD COMMENT
4
Entering edit mode
5.7 years ago
Fidel ★ 2.0k

Scanpy has good tutorials that can help you.

ADD COMMENT
2
Entering edit mode
5.3 years ago
kgosche ▴ 30

Partek Flow is a point-and-click analysis software for single-cell data. You can work with data from any platform and perform QA/QC, filtering, normalization, clustering, visualization, classification, statistical analysis, pathway analysis and so on. It has lots of online documentation as well as tech support so it really is easy to use. Here's information about its single-cell capabilities.

ADD COMMENT
0
Entering edit mode

It's also not free, so keep that in mind. Licenses are not cheap.

ADD REPLY

Login before adding your answer.

Traffic: 2699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6