Introduction to Bioinformatics using LINUX
http://www.prstatistics.com/course/introduction-to-bioinformatics-using-linux-ibul02/
Instructor: Dr. Martin Jones
This course will run from 16th - 20th October at SCENE (the Scottish Centre for Ecology and the Natural Environment), Loch Lomond National Park, Glasgow.
Course overview: Most high-throughput bioinformatics work these days takes place on the Linux command line. The programs which do the majority of the computational heavy lifting — genome assemblers, read mappers, and annotation tools — are designed to work best when used with a command-line interface. Because the command line can be an intimidating environment, many biologists learn the bare minimum needed to get their analysis tools working. This means that they miss out on the power of Linux to customize their environment and automate many parts of the bioinformatics workflow. This course will introduce the Linux command line environment from scratch and teach students how to make the most of its tools to achieve a high level of productivity when working with biological data.
Course programme
Day 1
- Session 1 - The design of Linux
In the first session we briefly cover the design of Linux: how is it different from Windows/OSX and how is it best used? We'll then jump straight onto the command line and learn about the layout of the Linux filesystem and how to navigate it. We'll describe Linux's file permission system (which often trips up beginners), how paths work, and how we actually run programs on the command line. We'll learn a few tricks for using the command line more efficiently, and how to deal with programs that are misbehaving. We'll finish this session by looking at the built in help system and how to read and interpret manual pages.
- Session 2 - System management
We'll first look at a few command line tools for monitoring the status of the system and keeping track of what's happening to processor power, memory, and disk space. We'll go over the process of installing new software from the built in repositories (which is easy) and from source code downloads (which is trickier). We'll also introduce some tools for benchmarking software (measuring the time/memory requirements of processing large datasets).
Day 2
- Session 3 - Manipulating tabular data
Many data types we want to work with in bioinformatics are stored as tabular plain text files, and here we learn all about manipulating tabular data on the command line. We'll start with simple things like extracting columns, filtering and sorting, searching for text before moving on to more complex tasks like searching for duplicated values, summarizing large files, and combining simple tools into long commands.
- Session 4 - Constructing pipelines
In this session we will look at the various tools Linux has for constructing pipelines out of individual commands. Aliases, shell redirection, pipes, and shell scripting will all be introduced here. We'll also look at a couple of specific tools to help with running tools on multiple processors, and for monitoring the progress of long running tasks.
Day 3
- Session 5 – EMBOSS
EMBOSS is a suite of bioinformatics command-line tools explicitly designed to work in the Linux paradigm. We'll get an overview of the different sequence data formats that we might expect to work with, and put what we learned about shell scripting to biological use by building a pipeline to compare codon usage across two collections of DNA sequences.
- Session 6 – Using a Linux server
Often in bioinformatics we'll be working on a Linux server rather than our own computer— typically because we need access to more computing power, or to specialized tools and datasets. In this session we'll learn how to connect to a Linux server and how to manage sessions. We'll also consider the various ways of moving data to and from a server from your own computer, and finish with a discussion of the considerations we have to make when working on a shared computer.
Day 4
- Session 7 – Combining methods
In the next two sessions — i.e. one full day — we'll put everything we have learned together and implement a workflow for next-gen sequence analysis. In this first session we'll carry out quality control on some paired-end Illumina data and map these reads to a reference genome. We'll then look at various approaches to automating this pipeline, allowing us to quickly do the same for a second dataset.
- Session 8 – Combining methods
The second part of the next-gen workflow is to call variants to identify SNPs between our two samples and the reference genome. We'll look at the VCF file format and figure out how to filter SNPs for read coverage and quality. By counting the number of SNPs between each sample and the reference we will try to figure out something about the biology of the two samples. We'll attempt to automate this analysis in various ways so that we could easily repeat the pipeline for additional samples.
Day 5
- Session 9 – Customization
Part of the Linux design is that everything can be customized. This can be intimidating at first but, given that bioinformatics work is often fairly repetitive, can be used to good effect. Here we'll learn about environment variables, custom prompts, soft links, and ssh configuration — a collection of tools with modest capabilities, but which together can make life on the command line much more pleasant. In this last session there will also be time to continue working on the next-gen sequencing pipeline.
The afternoon of Friday 20th is reserved for finishing off the next-gen workflow exercise, working on your own datasets, or leaving early for travel.
Please send inquiries to oliverhooker@prstatistics.com or visit the website www.prstatistics.com
Please feel free to distribute this information anywhere you think suitable.
Upcoming courses - email for details oliverhooker@prstatistics.com
ADVANCES IN MULTIVARIATE ANALYSIS OF SPATIAL ECOLOGICAL DATA USING R #MVSP 3rd – 7th April 2017, Scotland, Prof. Pierre Legendre, Dr. Olivier Gauthier http://www.prstatistics.com/course/advances-in-spatial-analysis-of-multivariate-ecological-data-theory-and-practice-mvsp02/
ADVANCING IN STATISTICAL MODELLING FOR EVOLUTIONARY BIOLOGISTS AND ECOLOGISTS USING R #ADVR 17th – 21st April 2017, Scotland, Dr. Luc Bussiere, Dr. Ane Timenes Laugen http://www.prstatistics.com/course/advancing-statistical-modelling-using-r-advr06/
CODING, DATA MANAGEMENT AND SHINY APPLICATIONS USING RSTUDIO FOR EVOLUTIONARY BIOLOGISTS AND ECOLOGISTS #CDSR 15th - 19th May, Scotland Dr. Aline Quadros http://www.prstatistics.com/course/coding-data-management-and-shiny-applications-using-rstudio-for-evolutionary-biologists-and-ecologists-cdsr01/
GEOMETRIC MORPHOMETRICS USING R #GMMR 5th – 9th June 2017, Scotland, Prof. Dean Adams, Prof. Michael Collyer, Dr. Antigoni Kaliontzopoulou http://www.prstatistics.com/course/geometric-morphometrics-using-r-gmmr01/
MULTIVARIATE ANALYSIS OF SPATIAL ECOLOGICAL DATA #MASE 19th – 23rd June, Canada, Prof. Subhash Lele, Dr. Peter Solymos http://www.prstatistics.com/course/multivariate-analysis-of-spatial-ecological-data-using-r-mase01/
TIME SERIES MODELS FOR ECOLOGISTS USING R (JUNE 2017 #TSME 26th – 30th June, Canada, Dr. Andrew Parnell http://www.prstatistics.com/course/time-series-models-foe-ecologists-tsme01/
BIOINFORMATICS FOR GENETICISTS AND BIOLOGISTS #BIGB 3rd – 7th July 2017, Scotland, Dr. Nic Blouin, Dr. Ian Misner http://www.prstatistics.com/course/bioinformatics-for-geneticists-and-biologists-bigb02/
META-ANALYSIS IN ECOLOGY, EVOLUTION AND ENVIRONMENTAL SCIENCES #METR01 24th – 28th July, Scotland, Prof. Julia Koricheva, Prof. Elena Kulinskaya http://www.prstatistics.com/course/meta-analysis-in-ecology-evolution-and-environmental-sciences-metr01/
SPATIAL ANALYSIS OF ECOLOGICAL DATA USING R #SPAE 7th – 12th August 2017, Scotland, Prof. Jason Matthiopoulos, Dr. James Grecian http://www.prstatistics.com/course/spatial-analysis-ecological-data-using-r-spae05/
ECOLOGICAL NICHE MODELLING USING R #ENMR 16th – 20th October 2017, Scotland, Dr. Neftali Sillero http://www.prstatistics.com/course/ecological-niche-modelling-using-r-enmr01/
INTRODUCTION TO BIOINFORMATICS USING LINUX #IBUL 16th – 20th October, Scotland, Dr. Martin Jones http://www.prstatistics.com/course/introduction-to-bioinformatics-using-linux-ibul02/
GENETIC DATA ANALYSIS AND EXPLORATION USING R #GDAR 23rd – 27th October, Wales, Dr. Thibaut Jombart, Zhian Kavar http://www.prstatistics.com/course/genetic-data-analysis-exploration-using-r-gdar03/
STRUCTURAL EQUATION MODELLING FOR ECOLOGISTS AND EVOLUTIONARY BIOLOGISTS USING R #SEMR 23rd – 27th October, Wales, Prof Jarrett Byrnes, Dr. Jon Lefcheck http://www.prstatistics.com/course/structural-equation-modelling-for-ecologists-and-evolutionary-biologists-semr01/
LANDSCAPE (POPULATION) GENETIC DATA ANALYSIS USING R #LNDG 6th – 10th November, Wales, Prof. Rodney Dyer http://www.prstatistics.com/course/landscape-genetic-data-analysis-using-r-lndg02/
APPLIED BAYESIAN MODELLING FOR ECOLOGISTS AND EPIDEMIOLOGISTS #ABME 20th - 25th November 2017, Scotland, Prof. Jason Matthiopoulos, Dr. Matt Denwood http://www.prstatistics.com/course/applied-bayesian-modelling-ecologists-epidemiologists-abme03/
INTRODUCTION REMOTE SENSING AND GIS APPLICATIONS FOR ECOLOGISTS #IRMS 27th Nov – 1st Dec, Wales, Dr Duccio Rocchini, Dr. Luca Delucchi http://www.prstatistics.com/course/introduction-to-remote-sensing-and-gis-for-ecological-applications-irms01/
INTRODUCTION TO PYTHON FOR BIOLOGISTS #IPYB 27th Nov – 1st Dec, Wales, Dr. Martin Jones http://www.prstatistics.com/course/introduction-to-python-for-biologists-ipyb04/
DATA VISUALISATION AND MANIPULATION USING PYTHON #DVMP 11th – 15th December 2017, Wales, Dr. Martin Jones http://www.prstatistics.com/course/data-visualisation-and-manipulation-using-python-dvmp01/
ADVANCING IN STATISTICAL MODELLING USING R #ADVR 11th – 15th December 2017, Wales, Dr. Luc Bussiere, Dr. Tom Houslay, Dr. Ane Timenes Laugen, http://www.prstatistics.com/course/advancing-statistical-modelling-using-r-advr07/
INTRODUCTION TO BAYESIAN HIERARCHICAL MODELLING #IBHM 29th Jan – 2nd Feb 2018, Scotland, Dr. Andrew Parnell http://www.prstatistics.com/course/introduction-to-bayesian-hierarchical-modelling-using-r-ibhm02/
ANIMAL MOVEMENT ECOLOGY (February 2018) #ANME ??th - ??th February 2018, Wales, Dr Luca Borger, Dr. John Fieberg
AQUATIC TELEMENTRY DATA ANALYSIS USIR R (TBC) #ATDAR ??th - ??th February 2018, Wales,
FUNCTIONAL ECOLOGY FROM ORGANISM TO ECOSYSTEM: THEORY AND COMPUTATION #FEER 5th – 9th March 2018, Scotland, Dr. Francesco de Bello, Dr. Lars Götzenberger, Dr. Carlos Carmona http://www.prstatistics.com/course/functional-ecology-from-organism-to-ecosystem-theory-and-computation-feer01/
STABLE ISOTOPE MIXING MODELS USING SIAR, SIBER AND MIXSIAR #SIMM Dr. Andrew Parnell, Dr. Andrew Jackson – Date and location to be confirmed
NETWORK ANAYLSIS FOR ECOLOGISTS USING R #NTWA Dr. Marco Scotti - Date and location to be confirmed
MODEL BASE MULTIVARIATE ANALYSIS OF ABUNDANCE DATA USING R #MBMV0 Prof David Warton - Date and location to be confirmed
ADVANCED PYTHON FOR BIOLOGISTS #APYB Dr. Martin Jones - Date and location to be confirmed
PHYLOGENETIC DATA ANALYSIS USING R (TBC) #PHYL Dr. Emmanuel Paradis – Date and location to be confirmed
Oliver Hooker PhD. PR statistics
most recent publication - The physiological costs of prey switching reinforce foraging specialization - Journal of animal ecology - http://onlinelibrary.wiley.com/doi/10.1111/1365-2656.12632/full
prstatistics.com
facebook.com/prstatistics/
twitter.com/PRstatistics
groups.google.com/d/forum/pr-statistics-post-course-forum
prstatistics.com/organiser/oliver-hooker/
3/1, 128 Brunswick Street
Glasgow
G1 1TF
+44 (0) 7966500340