Hi biostars,
I was assembling an eukaryote genome of 1.1Gb of length using the flye assembler and a set of long reads produced by the nanopore tech (ONT).
First, I could run the flye assembler with one fastq file out of the other six files that I got ( Those ONT raw reads are from the same individual but with different run libraries). and because of the lower coverage (around 10X) per file. Then I cat all fastq files into one.
Then, I tried to perform de novo assembly again with flye using the generated file with all reads: it seems that my reads are truncated or contain unknown characters by flye.
Command used to run the flye pipline: flye -t 40 -g 1.1g --nano-raw tlongsreads.fastq.gz -o flye_ont_60x
NB: I tried other assemblers (canu and wtdbg2), and all work fine with the same generated file.
This is the log error from the flye assembler:
> INFO: Starting Flye 2.9.1-b1780
> INFO: >>>STAGE: configure
> INFO: Configuring run
> WARNING: Input contain non-ACGT characters - they will be converted to arbitrary ACGTs
> ERROR: Invalid char while reading /path/to/ont/fly_run1_ont/tlongsreads.fastq.gz
> ERROR: Pipeline aborted
I searched the GitHub repo of the flye assembler issue, but I could not figure out how to fix this problem.
Is there a way that I can fix the reads raw error: such as deleting empty space or removing those Invalid char.
Any guidance or support is very appreciated.
Thank in advance
You could also convert the fastq file to fasta and try assembling again
colindaven, thank you so much for the quick reply. I tried your suggestion. But one of the original raw reads was corrupted. Since I have sufficient coverage of more than 43X, I just removed it from the concatenated file, and the assembler works fine.