Next generation sequencing is a rich source of data for sample classification. Exploring data with unsupervised methods like clustering is limited since many samples might have several possible cluster placements, so supervised classification of a portion of samples is a great approach. However, powerful classification techniques like SVM cannot accept typical gene expression tables with thousands of genes, so it is useful to first use a classifier with feature selection like step-wise linear discriminant analysis to reduce the number of features that is closer to the number of samples the study. This approach improves the quality of classification from standard LDA and increases confidence in NGS-based classification. These topics of machine learning techniques for NGS data are the subject of our series of workshops (https://edu.t-bio.info/workshops/) and the Transcriptomics 3 course on edu.t-bio.info. (https://edu.t-bio.info/course/transcriptomics-3/)
Soo..that's a beautifully colourful figure, but where's the tutorial? :D
sorry, something went wrong with formatting. The full tutorial is here: https://edu.t-bio.info/course/transcriptomics-3/
Quiz in the course and hands-on require company's product use.
if you take the course, you will see that by registering and requesting educational access, you get a free trial for 14 days - the course is completely free and the pipelines on the platform provide an output in R that you can download and modify.