Hello Reader,
I hope you are at best of your health.
I am pretty new to OXford nanopore raw data. I have mostly seen it in a single fastQ file for one sample format.
Recently I have received nanopore sequence data for our project. and i see 5 files per sample named as following.
- Isolate_1_fast5_fail.tar
- Isolate_1_fast5_pass.tar
- Isolate_1_fastQ_fail.tar
- Isolate_1_fastQ_pass.tar
- sequencing_summary_PAW77343_2a90c311_84f6a71c.txt
When I extract these files using tar -xf Isolate_1_fast5_fail.tar and now I have multiple files in each directory. like follow:
- Isolate_1_fastQ_pass contains 269 *.fastq.gz files
- Isolate_1_fastQ_fail contains 6 *.fastq.gz files
- Isolate_1_fast5_pass contains 269 *.fast5 files
- Isolate_1_fast5_fail contains 6 *.fast5 files
My intentions are to perform De Novo genome assembly. I know fast5 is native output format for Nanopore.
Question 1: What does the notation Fail/Pass mean ?
Question 2: For downstream analysis how to use the fastq files? Should i zcat
all the *.fastq.gz to one fastq.gz file and use this for input to the assembler of choice ?
Question 3: Which assembler is recommended for the genome assembly of Fungal Nanopore sequence data. As I also have Illumina Short read sequence data for the same samples.
Your valuable feedback is welcomed.
Thanks.
To add to colindaven 's answer.
Q1 -
Normally reads that satisfy the criteria
(qual >= 7.0 and length >= 0)
are markedpassed
.Q2 - You can use
cat
.zcat
is not needed.You will run
pycoQC
usingsequencing_summary_PAW77343_2a90c311_84f6a71c.txt
. That gives you a nice graphical overview of the run.