Question

How to fix long read from ONT : ERROR: Invalid char while reading fastq.gz file

0

Entering edit mode

2.7 years ago

ben@f ▴ 20

Hi biostars,

I was assembling an eukaryote genome of 1.1Gb of length using the flye assembler and a set of long reads produced by the nanopore tech (ONT).

First, I could run the flye assembler with one fastq file out of the other six files that I got ( Those ONT raw reads are from the same individual but with different run libraries). and because of the lower coverage (around 10X) per file. Then I cat all fastq files into one.

Then, I tried to perform de novo assembly again with flye using the generated file with all reads: it seems that my reads are truncated or contain unknown characters by flye.

Command used to run the flye pipline: flye -t 40 -g 1.1g --nano-raw tlongsreads.fastq.gz -o flye_ont_60x

NB: I tried other assemblers (canu and wtdbg2), and all work fine with the same generated file.

This is the log error from the flye assembler:

> INFO: Starting Flye 2.9.1-b1780
> INFO: >>>STAGE: configure
> INFO: Configuring run
> WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
> ERROR: Invalid char while reading /path/to/ont/fly_run1_ont/tlongsreads.fastq.gz
> ERROR: Pipeline aborted

I searched the GitHub repo of the flye assembler issue, but I could not figure out how to fix this problem.

Is there a way that I can fix the reads raw error: such as deleting empty space or removing those Invalid char.

Any guidance or support is very appreciated.

Thank in advance

long-reads assembly Fastq • 2.0k views

ADD COMMENT • link updated 15 months ago by Ram 45k • written 2.7 years ago by ben@f ▴ 20

score 1 · Answer 1 · 2022-11-21

1

Entering edit mode

2.7 years ago

colindaven 7.7k

Try

gunzipping and gzipping the file again (any errors ?)
Using various greps to find non-ACTG characters in the gunzipped fastq
check the end of the file - most likely to be problematic if truncated - using tail -n 100 x.fastq
check each sequence line has a quality line of the same length

ADD COMMENT • link 2.7 years ago by colindaven 7.7k

1

Entering edit mode

You could also convert the fastq file to fasta and try assembling again

ADD REPLY • link 2.7 years ago by samuel.a.odonnell ▴ 600

0

Entering edit mode

colindaven, thank you so much for the quick reply. I tried your suggestion. But one of the original raw reads was corrupted. Since I have sufficient coverage of more than 43X, I just removed it from the concatenated file, and the assembler works fine.

ADD REPLY • link 2.7 years ago by ben@f ▴ 20