I would like to announce a set of Python scripts and modules I have written for analysis and processing of long read sequencing data from Oxford Nanopore Technologies and Pacific Biosciences. They can be found on GitHub and can be installed using pip and conda.
Collectively they're called NanoPack, which can also be used to install all of the scripts simultaneously.
Scripts
NanoPlot: creating many relevant plots derived from reads (fastq), alignments (bam) and albacore summary files. Examples can be found in the gallery on my blog.
NanoComp: comparing multiple runs on read length and quality based on reads (fastq), alignments (bam) or albacore summary files.
NanoQC: Generating plots to investigate nucleotide composition and quality distribution at the end of reads.
NanoFilt: Streaming script for filtering a fastq file based on a minimum length and minimum quality cut-off. Also trimming nucleotides from either read ends is an option.
NanoStat: Quickly create a statistical summary from reads, an alignment or a summary file
NanoLyse: Streaming script for filtering a fastq file to remove reads mapping to the lambda phage genome (control DNA used in nanopore sequencing). Uses minimap2/mappy from lh3
Modules
nanoget: Functions for extracting features from reads, alignments and albacore summary data.
nanomath: Functions for mathematical processing and calculating statistics
nanoplotter: Appropriate plotting functions, heavily using the seaborn module and for some plots also plotly and bokeh
I welcome all feedback, bug reports, suggestions and feature requests!
Citation
This set of scripts has been published in Bioinformatics
guppy does adapter trimming what i read ,but i ran into my data there is adapter contamination. is there a way to clean it using your tool?