We have the last 3 places left for the Workshop on "Data analysis in high throughput biology" https://www.physalia-courses.org/courses/course3/.
Application deadline is: April 24th, 2017
Instructor:
Dr. January Weiner studied biology and mathematics at the Jagiellonian Univerity in Cracow, graduating in experimental evolutionary biology in 1996. He then moved to microbiology and transcriptomics, and received his Ph.D. from the University of Heidelberg for work on the transcriptome of Mycoplasma pneumoniae. Later, he worked on evolution of proteins in the bioinformatic group of Erich Bornberg-Bauer, and habilitated in 2009 in the area of evolutionary biology. Since 2009 he works at the Max Planck Institute for Infection Biology on high-throughput biomarkers in tuberculosis.
Course overview
High throughput (HT) techniques such as transcriptomics or metabolomics are of great significance in many areas of biology. With the standard techniques becoming more affordable and new techniques being introduced all the time, the amount of data sets generated is staggering. However, statistical and computational analysis of HT data sets present many challenges. In this course, the students will gain the ability to independently process and analyse HT data sets, select the appropriate tools, functionally interpret the results as well as learn the paradigms of computational biology and statistics which will allow them to efficiently communicate with computational biologists.
Intended audience
In general, the course is aimed at biologists who would like to take their data analysis in their own hands. While an aptitude for computational work is necessary, the main goal of the course is the application of biological and statistical knowledge to HT sets with as little effort as necessary:
- basic computer skills (a rudimentary knowledge of programming principles in any language is recommended, but not mandatory)
- basic understanding of statistics
- basic understanding of molecular techniques for generating high throughput data
The students should be comfortable with using a computer and have at least a rudimentary understanding of computer programming. However, no specific skills are necessary; the students will learn basic R programming in this course.
Basic skills in statistics are necessary. The students should understand the concepts of statistical hypothesis testing and p-values. However, an in-depth introduction to these concepts will also be provided.
Teaching format
On each day, the course will consist of four parts:
- Lecture: theoretical introduction to the days focus.
- Hands-on guide: guided practical session in R where students replicate the analysis performed by the teacher. While the lecture is general, here specific R techniques and R packages are introduced.
- Guided self-study: students are given excercises and problems to solve and work on them individually under the guidance of the teacher.
- Individual project work: each student will receive a transcriptomic (RNASeq or microarray) data set to analyse throughout the course.
- Lecture: wrap-up and side notes; preparation for the following day
Target student skills
- Overview of commonly used high-throughput data types
- Techniques for data clean-up and preparation for analysis
- Understanding of computational problems associated with high-throughput data analysis
- Statistical problems and solutions in analysis of HT data
- Practical skills in analysis methods of HT data:
- basic differential analysis (limma, DESeq, alternative and non-parametric techniques)
- set enrichment techniques (GSEA, gene ontologies, metabolic profiling and more)
- multivariate approaches to data analysis (PCA / ICA, PLS, multiple correspondance analysis)
- basic approaches in machine learning: cross
- Communication skills in statistics and computational biology
After the course, the student should be able to prepare, analyse and interpret a HT data set, including multivariate and machine learning techniques.
Programme: https://www.physalia-courses.org/courses/course3/curriculum3/