Hello guys,
I am new on bioinformatic analysis of data from ONT (MinION with FLOW-MIN106D). The lab sent me a collection of fast5 files I have to analyze to obtain the metagenomic details (viruses). I have the new machine with on board Ubuntu 21 (Nvidia GE-3080, 96G ram, 24 cores). Nvidia in already working under Ubuntu.
I know the first step to do is the basecalling and the demultiplexing and I would like to use the guppy_basecalling (many manuscripts mention it). I searched for some detailed procedure about the ONT pipeline installation on Ubuntu, but I came very confused: does I need to install the MinKNOW on ubuntu? mioion-nc? What does it mean "live basecalling"? Is the guppy package enough?
Thank you for your help.
Emilio
You can try running existing pipelines https://nf-co.re/nanoseq. Once you are confident, then search for other pipelines for ONT data and execute them. Try to subsample your data, for pilot run.
I was actually wondering about using FastQC & MultiQC with ONT data (given reads length)? can they really provide correct QC with ultra-long reads on average ~100kbps?
The suggested pipeline is using both of them, is that really ok?
Hi,
MinKNOW is the software to control the MinION sequencing device and provide all the progress and visualization while the sequencing is going on. During the sequencing, live basecalling is done to see the number of reads, passed and failed reads etc. Normally, people do the sequencing, get the FAST5 files and then do the high accuracy or Super accuracy basecalling using standalone Guppy. Hence, you do not need to worry about MinKNOW or live basecalling. You can simply install GPU version Guppy and do the HAC or SUP basecalling. Once you have FastQ files, you can figure out what to do next.
Thank you very much. The confusion comes from the fact that someone uses the workstation computing resource to perform the "line basecalling" too. I suppose the MinKNOW is needed in that case.
You can try running existing pipelines
https://nf-co.re/nanoseq
. Once you are confident, then search for other pipelines for ONT data and execute them. Try to subsample your data, for pilot run.I was actually wondering about using FastQC & MultiQC with ONT data (given reads length)? can they really provide correct QC with ultra-long reads on average ~100kbps? The suggested pipeline is using both of them, is that really ok?