Course: GENOME ASSEMBLY USING OXFORD NANOPORE SEQUENCING
Course website: https://www.physalia-courses.org/courses-workshops/course59/
Where: Free University Berlin (Germany)
When: 10 – 14 February 2020
COURSE OVERVIEW
New advances in sequencing technologies have opened the door to more contiguous genome assemblies due to the increased length of obtained fragments. Although there is a setback in accuracy, a broad range of algorithms has been developed to cope with it.
This course will introduce the audience with a spectre of methods which are present in a usual assembly workflow, starting from raw data and finishing with a fully assembled genome. We will see how to obtain nucleotide sequences from raw signals, dive deeper into the most used assembly paradigm for long fragments, try out and compare several state-of-the-art assemblers, and at last, assess the quality of the obtained assembly with and without a reference genome.
Structured over five days, this course consists of both theoretical and practical aspects which are intertwined through each day. The presented theoretical foundation will be applied on small bacterial datasets and visualized in order to better grasp the algorithms at hand.
TARGET AUDIENCE
This course is intended for researchers interested in learning the concepts of algorithms for de novo genome assembly with Oxford Nanopore Technologies data. Both beginners and more advanced users will find useful information in the presented matter. Course attendees should bring a laptop with either macOS or any Unix version. Some background in using mentioned operating systems via the command line is desirable, but we will cover the needed essentials throughout the hands-on sessions.
LEARNING OUTCOMES
- Learn the advantages and disadvantages of third generation of sequencing.
- Understand the concepts of de novo genome assembly.
- Obtain practical experience on using state-of-the-art tools for de novo assembly and assembly quality assessment.
PROGRAM
Monday – Classes from 09:30 to 17:30
Session1: Introduction
This course starts with a general introduction to sequencing and assembly. The audience will get familiar with Oxford Nanopore sequencing, how it works, its advantages and disadvantages. Afterwards, we will transform a subset of a bacterial dataset, containing electric current signals, into a set of nucleotide sequences with error rate higher than previous generations of sequencing.
Session2: Stitching fragments
Sequencing technologies are still unable to read the whole genome at once, therefore the obtained fragments need to be joined together. We will first try and use sequence alignment, the basis of many bioinformatics tools. As it is not feasible for larger amounts of data, we will investigate a heuristic approach that uses short substrings of predefined length (Minimap). We will discuss the trade-off between execution time and sensitivity, and its impact on assembly contiguity, and apply this method on a small bacterial dataset.
Tuesday – Classes from 09:30 to 17:30
Session3: Unknotting graphs
Given the set of pairwise overlaps between fragments, we will build an assembly graph from which the genome can be reconstructed (Miniasm). The graph will look like a yarn ball due to the sheer amount of overlaps. Step-by-step, we will introduce and apply several simplification methods to untangle the graph. There will still be knots in the graph which occurred due to sequencing errors. We will examine and try to resolve them. Afterwards, contiguous chains of fragments will be extracted and used in the next phases.
Session4: Polishing until it shines
Contigs from the assembly graph will have accuracy as the sequencing yield and will be unusable for most downstream analyses. Therefore, we will map all fragments to the assembly and create a multiple sequence alignment with partial order graphs (Racon). Retaining the most frequent base in all fragments at a given assembly position, we will iteratively try to increase the overall accuracy. Once we reach the maximum, we will see if we can further improve the assembly by using signal level data (Nanopolish).
Wednesday – Classes from 09:30 to 17:30
Session5: Quality assessment
Quality of the assembly is important for downstream analysis so we will assess it in three different aspects: base accuracy (MuMmer) and completeness (QuastLG) given the reference genome, and protein prediction (orthologs (BUSCO) and ORFs (Ideel)). We will cover each appropriate tool and apply them on our assembly.
Session6: State-of-the-art
We will go through the basic concepts of several state-of-the-art assemblers such as Canu, Redbean, Flye, etc. We will apply each of them on the same dataset and create an evaluation consisting of contiguity, accuracy and the amount of resources needed.
Thursday – Classes from 09:30 to 17:30
Session7: State-of-the-art continued
Session8: Group task
Attendees will get several sets of fragments obtained with Oxford Nanopore sequencing, ranging from a couple of megabytes to a hundred. The task will be to assemble as many of the datasets as possible with different assemblers, and evaluate the quality of each assembly. Participants will be grouped into pairs or triplets. We also encourage them to bring their own data if they deem it interesting to assemble.
Friday – Classes from 09:30 to 17:30
Session9: Group task continued
Session10: Presentations
Each group will present the result of their work which will be followed by a general discussion about the group task and the course itself.