To the best of my knowledge, there are currently no tutorials available for immune repertoire sequencing (RepSeq) data analysis. On the other hand, the number of dedicated software tools published is steadily growing and this topic might be of interest to the community (e.g., TCR sequence analysis, How to analyze TCR-repertoire from raw data of full length RNA-Seq SmartSeq2 ?, Immune repertoire sequencing community). Therefore, I've decided to publish my own, originally prepared for an EMBO practical course.
The tutorial covers some basics of RepSeq data processing and analysis. It focuses on T-cell receptor (TCR) amplicon libraries prepared using molecular tagging approach. The latter greatly facilitates quantification and error correction which are crucial for highly-complex repertoire sequencing data.
The tutorial includes
- Pre-processing of raw sequencing reads and molecular tag-based consensus assembly
- Mapping of TCR Variable, Diversity and Joining segments and TCR clonotype assembly
- Estimating the diversity of T-cell repertoire sample
- Calculating similarity between TCR repertoires
The full tutorial is available here, all necessary datasets and software are deposited at GitHub.
Update
Just released another tutorial, which is mostly R-based, it can be found at antigenomics/repseq-annotation-tutorial. This tutorial covers comparative analysis of T-cell repertoires and is aimed at studying the following properties of RepSeq data:
- Repertoire diversity
- TCR segment usage
- TCR sequence sharing
- TCR sequence annotation using a curated database of TCR sequences with known specificity
The main idea here is to infer the metadata (such as donor id, T-cell subset and T-cell phenotype) from raw RepSeq data.
very thorough! excellent